Back to Ensuring Successful Operation of a Cloud Solution

Horizontal Pod autoscaling

5 minutes 5 Questions

Horizontal Pod Autoscaling (HPA) is a crucial feature in Google Kubernetes Engine (GKE) that automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed metrics such as CPU utilization, memory usage, or custom metrics. When you configure HPA, yo…

Horizontal Pod Autoscaling (HPA) - Complete Guide

Why Horizontal Pod Autoscaling is Important

Horizontal Pod Autoscaling is a critical feature in Google Kubernetes Engine (GKE) that enables your applications to automatically adapt to changing workload demands. It ensures optimal resource utilization by scaling the number of pod replicas based on observed metrics, which helps maintain application performance during traffic spikes while reducing costs during low-demand periods.

What is Horizontal Pod Autoscaling?

Horizontal Pod Autoscaling (HPA) is a Kubernetes feature that automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed CPU utilization, memory usage, or custom metrics. The term horizontal refers to scaling out (adding more pods) or scaling in (removing pods), as opposed to vertical scaling which would increase resources for existing pods.

Key Components:
- HPA Controller: Monitors metrics and makes scaling decisions
- Metrics Server: Collects resource metrics from pods
- Target Metric: The threshold that triggers scaling actions
- Min/Max Replicas: Boundaries for scaling operations

How Horizontal Pod Autoscaling Works

1. Metric Collection: The Metrics Server continuously collects CPU and memory utilization data from all pods

2. Evaluation Loop: The HPA controller checks metrics every 15 seconds by default

3. Calculation: HPA calculates the desired number of replicas using the formula:
desiredReplicas = ceil(currentReplicas × (currentMetricValue / desiredMetricValue))

4. Scaling Decision: If the calculated replicas differ from current replicas and fall within min/max bounds, scaling occurs

5. Cooldown Period: After scaling, there is a stabilization window to prevent rapid fluctuations

Configuration Example:
- Target CPU utilization: 80%
- Minimum replicas: 2
- Maximum replicas: 10

Exam Tips: Answering Questions on Horizontal Pod Autoscaling

1. Understand the Scaling Direction:
Remember that HPA scales horizontally by adding or removing pods, not by changing pod resource limits. Questions may try to confuse horizontal and vertical scaling concepts.

2. Know the Default Metrics:
CPU utilization is the most common metric tested. Memory-based HPA requires additional configuration. Custom metrics require the Custom Metrics API.

3. Remember Key Constraints:
- HPA requires a Metrics Server to be running
- Resource requests must be defined for CPU-based autoscaling to work
- HPA cannot scale to zero pods (minimum is 1 unless using KEDA)

4. Distinguish from Cluster Autoscaler:
HPA scales pods within nodes, while Cluster Autoscaler adds or removes nodes. Exam questions often test whether you understand this distinction.

5. Common Scenario Recognition:
- Traffic spikes requiring more capacity → HPA
- Cost optimization during off-peak hours → HPA with lower min replicas
- Maintaining SLAs during variable load → HPA with appropriate targets

6. Configuration Details to Remember:
- minReplicas: Ensures minimum availability
- maxReplicas: Prevents runaway scaling and cost overruns
- targetCPUUtilizationPercentage: Commonly set between 50-80%

7. Watch for Trick Questions:
- HPA requires pods to have resource requests defined
- Scaling happens based on the average utilization across all pods
- HPA works with Deployments, ReplicaSets, and StatefulSets, but not DaemonSets

8. Command Line Knowledge:
Be familiar with: kubectl autoscale deployment [name] --cpu-percent=80 --min=2 --max=10

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

Unlock Premium Access

Google Cloud Associate Cloud Engineer

Access to ALL Certifications: Study for any certification on our platform with one subscription
4817 Superior-grade Google Cloud Associate Cloud Engineer practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
GCP ACE: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

More Horizontal Pod autoscaling questions

30 questions (total)

Start 30 question test