Introduction
Thank you for clicking through to my arcticle. I've been a DevOps engineer for 2 years in dev-team of 7 engineers.
My name is MINSEOK, LEE, but I use Unchaptered as an alias on the interenet. So, you can call me anythings "MINSEOK, LEE" or "Unchaptered" to ask something.
Topic
Attending AWS Summit Seoul 2023, I learned about Amazon EKS.
In this article, I'll cover the following:
The stability of Amazon EKS
The stability of Control Plane
The stability of Data Plane
The stability of Version Update
The stability of Amazon EKS
Basically, kubernetes is devided into Control Plane and Data Plane.
If you use kubernetes, not using Saas, you must manage all elements of kubernetes.But Amazon EKS run "reliable container application" with "minimal resources" on AWS
Amazon EKS manage Control Plane to reduce the maintainence of resources for kubernetes. Also, EKS supports Self Managed EC2, Managed Node Group, Fargate(serverless compute) for Data Plane to reduce maintainence of application
Amazon EKS have five core values.
In this article, we talked about Stability.
In general, what's the meaning of the stability?
We have 2 kind of key concept as following:HA(High Availability) : 99.999%
DR(Disaster Recovery) : RTO / RPO
References on these previous documents.
For building and maintain HA, the each components of kubernetes must be stable.
What's the SRM(Shared Responsibility Model)?
AWS has a responsibility both HW of EKS and stability for components of Control Plane.
Customer has a responsibility of Data Plane contains application code.
For example, customer must consider the pod uploaded application code and deployment and services for exposing.
And Cross-Account ENI is exists to connect between control plane and data plane. AWS provides Croos-Account ENI in target AZs to securely communicate between AWS and customer's VPC.
Amazon EKS Control Plane
EKS Control Plane ensures a reliability for each elements.
API Server : handle kubernetes api
Cloud Controller : connect aws cloud and kubernetes
Controller Manager : manage several controller
Scheduler : controll pod's deployed into right node.
etcd : save kubernetes data
The stability of Control Plane
The API Server Instance reinforce high availability as Active-Active and minimum replicas is 2.MUM to
The etcd instance is implemented as Key-Value Storage and minimum replicas is 3.
The same things between api server instance and etcd instance.
Implement as auto scaling group for scalabtiliy
Implement as distributed in multi-az for high availability.
The scale-up of Control Plane
In unexpected situation, control plane instance must scale up using scaling signal.
In accordance with CPU, MEM usage, Control Plane instance must be scale up/down.
In accordance with Disk usage and rest, Control Plane instance must be scale up/down.
Customer(engineer) don't set up using API or console.
AWS reinforce the performance of control planeNo additional price
The stability of Data Plane
Amazon EKS supports HA with the 3 way
Self Recovery
Minimize Effected Area
Fast Scale-out
Stability 1 - Self Recovery
Kubenretes have owned self recovery function with status management machinsm.
When you create resources in kubernetes, you'ld write kubernetes manifest files as YAML format. In there, you record desire state.
Several Controllers will continuously try to change current state into desired status. It called "kubernetes control loop". It's separated this 3 steps.
Observe
Diff
Act
There are several ways to create a pod.
The simplest way is directly create pods, but it's less useful because you don't have any controller to monitor and recover pods.
Therefore you will typically create pods using deployment.
Deployment use RS(replicaSet) to keep the count of pods.
Stability 1.a. - General 3 way for health check
Genral Health Check Way with deployded application
HTTP Request
Command
TCP Socket
Stability 1.b. - Probe, Health Check in Kubernetes
In kubernetes, kubelet(in worker node) use probe to act HealthCheck
readinessProbe
Command → Readiness (5s) : Is application ready to listen request?
readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5
The worker node has installed kubelet.
The kubelet use readinessProbe for health check.
If some trouble is occured in node group, kubelet remove pod in service endpoint. When trrafic is comming, unhealthy pods don't receive request.
Only healthy pods will receive request.After trouble is fixed, kubelet contians fixed pods into service endpoint.
livenessProbe
HTTP GET → Liveness (5s) : Is application ready to listen request?
livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 periodSeconds: 5
The worker node has installed kubelet.
If some trouble is occured in node group, kubelet remove pods in service endpoint. And then kubelet restart container.
After trouble is fixed, kubelet contains fixed pods into service endpoiint.
Stability 2 - Minimize Effected Area
If you operate service as MSA environment, some service's trouble is spreaded into other service. These trouble can effect Kubernetes clsuter.
Stability 2.a. - Pod Scheduling
Kubernetes Manifest File
apiVersion: apps/v1
kind: Deployment
metaData:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 8
selector:
matchLabels:
app: nginx
template:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
port:
- containerPort: 80
In manifest file, replicas is setting on 8 count.
The kubernetes sched decideded where is pods' position into each node.
sched : set priority of pod deployment
affinity : pods' preference for each nodes.
request / limit
node resources : node's CPU and MEM usage.
Sometime, pods may be deployed in some nodes.
In this case...
If some nodes, contains many pods, is dying, remaining system couldn't cover the basic traffic.Becuase kubelect heath check period is 5 minute(default),
the worst situation is kubelet knows node's trouble after 5 mintues.In this period, pod isn't deployded as a unhealth status.
Stability 2.b. - podAntiAffinity
The most simplest solution is podAntiAffinity setting.
It ensures that all pods are distributed uniformly across all nodes.
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpression:
- key: app
operator: ln
values:
- nginx
topologyKey: "kubernetes.io/hostname"
containers:
- name: nginx
image: nginx
Stability 2.c. - topologySpreadConstraints
The other way is topologySpreadConstraints setting.
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: nginx
containers:
- name: nginx
image: nginx
Stability 2.d. - podAntiAffinity & topologySpreadConstraints
By setting podAntiAffinity and topologySpreadConstraints, you can distribute pods across all nodes by availability zones.
Stability 2.e. - Maintain High Importance Service
He've take the example to help our understand.
Now 4 pods is running on Node Group.
Now 2 type of pods is running on Node Group.
First type is pods for ordering system.
Second type is pods for review system.
The important pods for business model is pods for odrering system.
When the rest amount of CPU or MEM is not enough in node, node exit some pods. In this case, if the order-related pods are exited, this can cause significant problem for our business.
To prevent it, there is a way to allocate resources to container.
There are 2 keys that can be assigned.Request : minimum amount to deploy pod.
Limit : maximum amount to consume pod.
→ Exceeding the limit will cause CPU to throttle, MEM to have out of memory. And then, the allocation process is killed.
Through container resource allocation, you can specify QoS(Quality of Service).
Guaranteed > Burstable > BestEffort to ensure resource utilization
BestEffort : no setting with Request, Limit
→ If node resource is enough, pod is continuously executed.
→ But if node rsource is not enough, pod is first killed.Burstable : have setting with Request, Limit (Request < Limit)
→ If node resource is enough, pod is continuously executed.
→ If pods is little over than Request, burst function is supported under Limit
→ If node resource is not enough, pod is second killed.Guaranteed : have setting with Request, Limit (Request == Limit)
→ Guaranteed to keep the pod alive for asap.
→ Won't termiate until node resource are extremely low.
Stability 3 - Fast Scale-out
When spike traffic is occured, you can use the following services.
HPA (Horizontal Pod Autoscaler)
CAS (Cluster Auto Scaler)
Karpenter
Stability 3.a. - HPA(Horizontal Pod AutoScaler)
You have 2 nodes and the pods are running normally.
One of the pods have heavy traffic. And then pod's cpu/mem usage metric is collected by Metric Server. HPA Controller use collected metric to controller pods.
HPA Controller change the value of RS(ReplicaSets) for creating new pods.
Stability 3.b. - CAS(Cluster Auto Scaler)
If cpu/mem is not enough, HPA Controller occure Pending Pod(s).
In this case, node group need more node to change pending pod(s) into pod(s).CAS change Auto Scaling Group's capacity amouont to create new node.
After new node is provisioning, pending pod(s) is changed into pod(s).
Scability 3.c. - Cons of CAS
CAS has the following cons.
Not fast to scale out
When MSA, you need to create node group for each instance type.
(One CAS can handle one ASG, one ASG can use one instance type)If you wanna use Spot instance, you need to deploy new node group.
Stability 3.d. Karpenter
The karpenter directly create node.
If pending pod is exists, the karpenter create ec2 instance(fleet)
Basically asg is more slow than single ec2 provisioning.
Becuase the karpetner create fit-size instance family, the scale pricing is more cheaper than CAS. And when node is not working enough, karpenter kill the rest ec2 instance(fleet)
Amazon EKS Version Update
Slow Major Release : 150 days
Minor Release : 3 months
Technical Supports : 14 months
Update EKS Version in-place
In-place update can choose only next version
Update EKS Cluter
Update EKS NodeGroup
Update process of EKS as in-place update
For your understading, some example exists.
Have 3 running worker node.
Create each new worker node in each running worker node.
Send Cordon Command to running worker node.
Cordon command means "Don't create new pod in this worker node"
Sche's target don't contain this node.Send Drain Command to running worker node.
The pod in running worker node is moved into new worker node.
This way don't ensure Availability.
So you must set up PDB(Pod Disruption Budget) options.
Basically drain request don't ensure the same amount of pod count, service encounter issue about previous traffic. So, you must set up minAvailable amount.apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: nginx spec: minAvailable: 4 selector: matchLabels: app: nginx
Cons of in-place update
Cant't roll back previous version
Version compatibility issue can cause unexpected failure.
Requires version to be updated on step at a time.
If you missed an update cycle, you'll have to do it N times.
Update EKS Version as Blue Green
You can use blue green update using Route53's weighting feature.
Manage EKS Cluster
Finally, you can integrate several tools to automate EKS cluster management.
Amazon CDK
ArgoCD
Add-ons
GitHub