In the CKA context, troubleshooting resource constraints revolves around distinguishing between Pod-level definitions and Namespace-level policies.
**1. Pod-Level Troubleshooting (Requests vs. Limits)**
* **OOMKilled (Exit Code 137):** This occurs when a container attempts to use more memory th…In the CKA context, troubleshooting resource constraints revolves around distinguishing between Pod-level definitions and Namespace-level policies.
**1. Pod-Level Troubleshooting (Requests vs. Limits)**
* **OOMKilled (Exit Code 137):** This occurs when a container attempts to use more memory than its configured **limit**. To diagnose, run `kubectl describe pod <pod-name>` and check the 'LastState' or 'State' for 'OOMKilled'.
* **Fix:** Increase the memory limit in the manifest or debug the application for memory leaks.
* **CPU Throttling:** If a container hits its CPU limit, Kubernetes throttles it (performance degrades) rather than killing it. Use `kubectl top pod` to monitor usage.
* **Pending State (Insufficient Capacity):** If the cluster nodes do not have enough unallocated CPU/Memory to match a Pod's **requests**, the Pod remains 'Pending'. `kubectl describe pod` will show 'FailedScheduling' with 'Insufficient cpu/memory'.
* **Fix:** Add nodes, scale down other workloads, or reduce the Pod's resource requests.
**2. Namespace-Level Troubleshooting (ResourceQuota)**
ResourceQuotas enforce hard limits on the aggregate resource usage within a Namespace.
* **Symptoms:** You receive a 'Forbidden' error upon creation (e.g., `exceeded quota: compute-quota`), or a ReplicaSet fails to create Pods.
* **Diagnosis:** Inspect the current usage against the hard limits using `kubectl describe resourcequota -n <namespace>`. This displays columns for 'Used' vs. 'Hard' limits for CPU, memory, and object counts (e.g., number of Pods).
* **Fix:** Increase the ResourceQuota limits, delete unused resources in that Namespace to free up capacity, or lower the requests/limits on the specific Pods you are trying to deploy.
**3. LimitRange:** If a Pod is rejected for violating minimum/maximum constraints, check `kubectl describe limitrange` in the namespace.
Troubleshooting Resource Quotas and Limits
Why is it Important? In a shared Kubernetes cluster, unconstrained workloads can consume all available CPU and memory, causing a "noisy neighbor" effect that crashes other critical applications. Resource Quotas and Limits are the primary mechanisms to govern capacity usage. Troubleshooting these issues is a core skill for the CKA exam because it proves you can diagnose why workloads are failing to schedule or running out of memory.
What is it? ResourceQuota: An object defined at the Namespace level that limits the total aggregate resource consumption (CPU, Memory, Storage, Object counts) for that namespace. LimitRange: An object that defines default, minimum, and maximum resource constraints for individual containers/pods within a namespace. Requests vs. Limits:Requests guarantee resources for scheduling; Limits define the ceiling that triggers throttling (CPU) or termination (Memory).
How to Troubleshoot 1. Pod Creation Fails (Forbidden): If you try to create a Pod and get an error stating "forbidden: exceeded quota", the namespace has reached its hard limit. Action: Run kubectl describe resourcequota -n <namespace>. Compare the Used column against the Hard column to identify which resource (e.g., limits.cpu) is exhausted.
2. OOMKilled Status: If a Pod crashes with status OOMKilled (Exit Code 137), the container tried to use more memory than its specified Limit. Action: Run kubectl describe pod <name> and look under "Containers > State". You must either increase the memory limit in the Pod manifest or debug the application's memory leak.
3. CPU Throttling: If an application is running slowly but not crashing, it may be hitting its CPU limit. While harder to see directly without metrics, this usually requires increasing the CPU limit.
Exam Tips: Answering Questions on Resource Quota and Limit Troubleshooting Step 1: Diagnose. Do not guess. Immediately run kubectl describe pod <pod-name> to see the Events. Look for messages like "FailedScheduling" (often due to Requests exceeding Node capacity) or "OOMKilled". Step 2: Check Namespace Constraints. If the Pod events don't show the issue, check the namespace: kubectl get resourcequota,limitrange -n <namespace>. Step 3: Solve. - If a Quota is full: kubectl edit resourcequota <name> -n <ns> and increase the Hard limit (if the question allows changing infrastructure) OR lower the Pod's requests (if the question asks to fit the pod). - If OOMKilled: You must increase the memory limit in the Deployment/Pod YAML. Step 4: Persistence. Remember that you cannot update resources on a running Pod. You must edit the Deployment or delete and re-create the Pod definition.