In the context of the CKA exam and Workloads & Scheduling, the Horizontal Pod Autoscaler (HPA) is a control loop that automatically scales the number of Pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metric utilization. Its primary purpose is to ensure applications have …In the context of the CKA exam and Workloads & Scheduling, the Horizontal Pod Autoscaler (HPA) is a control loop that automatically scales the number of Pod replicas in a Deployment, ReplicaSet, or StatefulSet based on observed metric utilization. Its primary purpose is to ensure applications have sufficient resources during peak traffic while reducing waste during low usage.
HPA relies heavily on the **Metrics Server** to aggregate resource data (like CPU and Memory) from the Kubelets. If the Metrics Server is not deployed, the HPA will fail to retrieve data, often displaying `<unknown>` under the targets column.
The scaling logic follows a ratio formula: `desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]`. For example, if the target CPU utilization is set to 50% and the current utilization is 100%, the HPA will attempt to double the number of replicas.
Key takeaways for the CKA exam include:
1. **Prerequisites:** You must have the Metrics Server installed, and your Pod containers **must have resource requests defined** (e.g., `requests: cpu: 100m`). Without requests, HPA cannot calculate percentage-based utilization.
2. **Configuration:** HPAs define `minReplicas` and `maxReplicas` to constrain scaling limits. They also include stabilization windows to prevent 'thrashing' (rapidly scaling up and down due to metric noise).
3. **Commands:** You should know how to create an HPA imperatively using `kubectl autoscale deployment <name> --cpu-percent=50 --min=1 --max=10` and how to inspect its status using `kubectl get hpa` and `kubectl describe hpa`.
Mastering Horizontal Pod Autoscaler (HPA) for the CKA Exam
What is Horizontal Pod Autoscaler (HPA)? The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set, or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Unlike Vertical Autoscaling (which increases the size of the pod resources), HPA increases the quantity of pods.
Why is it Important? In a production Kubernetes environment, traffic patterns are rarely static. HPA ensures that your application remains responsive during traffic spikes by adding replicas and saves costs during low-traffic periods by removing unnecessary replicas. For the CKA exam, understanding this demonstrates your ability to manage cluster capacity and application availability dynamically.
How it Works HPA is implemented as a control loop, with a period controlled by the controller manager's --horizontal-pod-autoscaler-sync-period flag (default is 15 seconds). 1. Metrics Collection: HPA queries the resource utilization against the metrics specified in the HPA definition. Note: You must have the Metrics Server installed in the cluster for HPA to work based on resource metrics. 2. Calculation: The controller calculates the desired number of replicas using the formula: desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )] 3. Scaling: If the calculated number differs from the current number, the HPA modifies the replicas field in the target resource.
How to Answer HPA Questions in the Exam CKA questions regarding HPA usually involve creating an autoscaler for an existing deployment or troubleshooting why one isn't working.
1. Creating an HPA: The fastest way to create an HPA in the exam is using the imperative command. kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10 This creates an HPA that maintains an average CPU utilization across all Pods of 50%, scaling between 1 and 10 replicas.
2. Verifying HPA Status: Use kubectl get hpa to see the current state. If you see <unknown>/50% under TARGETS, it means the HPA cannot read the metrics yet (Wait a few moments, or check if Metrics Server is running).
Exam Tips: Answering Questions on Horizontal Pod Autoscaler (HPA) Tip #1: The Metrics Server Prerequisite If an HPA task fails or reports <unknown> metrics, check if the Metrics Server is installed (kubectl top node or kubectl top pod). If these commands fail, HPA cannot function.
Tip #2: Resource Requests are Mandatory For HPA to calculate CPU percentage, the Pods must have resource requests defined in their spec. If the Deployment spec does not define resources.requests.cpu, the HPA will not function because it lacks a baseline to calculate the percentage against.
Tip #3: The Cooldown Period HPA has a built-in delay (default 5 minutes) before scaling down to prevent 'thrashing' (rapidly scaling up and down). If you don't see the replica count drop immediately after load decreases, this is normal behavior, not a bug.
Tip #4: Editing HPA If asked to modify an existing HPA (e.g., change max replicas), use kubectl edit hpa <hpa-name>. It is faster than deleting and recreating it.