In the context of the CKA exam, troubleshooting a worker node frequently centers on the **Kubelet**, the primary agent running on every node. When a node is marked `NotReady`, your initial step should be `kubectl describe node <node-name>` to identify conditions such as `DiskPressure`, `MemoryPress…In the context of the CKA exam, troubleshooting a worker node frequently centers on the **Kubelet**, the primary agent running on every node. When a node is marked `NotReady`, your initial step should be `kubectl describe node <node-name>` to identify conditions such as `DiskPressure`, `MemoryPressure`, or network unavailability.
If the issue persists, SSH into the affected node. First, verify the service status using `systemctl status kubelet`. If it is inactive or crashing, examine the logs using `journalctl -u kubelet` to find specific error messages.
Common failure scenarios include:
1. **Certificate Mismatches:** The Kubelet requires valid certificates to authenticate with the API server. Ensure the paths in the `kubeconfig` file (usually `/etc/kubernetes/kubelet.conf`) point to valid, non-expired client certificates and the correct CA.
2. **Configuration Errors:** Check `/var/lib/kubelet/config.yaml`. Misconfigured paths to the CNI (Container Network Interface) binaries or the container runtime endpoint will prevent the Kubelet from starting pods.
3. **Container Runtime Issues:** The Kubelet depends on a runtime (like `containerd` or `CRI-O`). If the runtime service is stopped (`systemctl status containerd`), the Kubelet cannot operate. Furthermore, ensure that the `SystemdCgroup` driver settings match between the Kubelet and the runtime.
4. **Swap Memory:** By default, Kubernetes fails if swap is enabled. Ensure `swapoff -a` is run and that swap is disabled in `/etc/fstab`.
A systematic approach—checking the service status, analyzing logs for cert/config errors, and verifying runtime dependencies—is the standard for resolving node issues.
Node Troubleshooting: Kubelet
Why it is Important The kubelet is the primary "node agent" that runs on each node. It is arguably the most critical component on a worker node because it is responsible for registering the node with the Kubernetes API server and managing the lifecycle of Pods and containers on that specific machine. If the kubelet fails, the control plane cannot communicate with the node, the node status becomes NotReady, and Pods cannot be scheduled or managed effectively. Troubleshooting kubelet issues is a core competency for the CKA exam as it tests your ability to diagnose system-level services underlying the cluster.
What it is The kubelet is a binary that runs as a system service (usually via systemd) on every node in the cluster. Unlike Pods which run inside containers, the kubelet runs directly on the host operating system. It works by taking a set of PodSpecs (provided by the API server or static file paths) and ensuring that the containers described in those PodSpecs are running and healthy.
How it works 1. Registration: Upon startup, the kubelet registers the node with the API server. 2. Monitoring: It constantly watches for new Pod assignments from the API server. 3. Execution: It instructs the container runtime (like containerd or CRI-O) to pull images and start containers. 4. Reporting: It reports the status of the node and the Pods back to the control plane.
How to Troubleshoot Kubelet Issues To diagnose a failed node, follow this logical flow: 1. Check Node Status: Run kubectl get nodes. If a node is NotReady, it usually implies a kubelet or networking issue. 2. Access the Node: SSH into the problematic node (e.g., ssh node01). 3. Check Service Status: Use systemctl status kubelet. Look for states like inactive, failed, or activating. 4. Check Logs: This is the most important step. Run journalctl -u kubelet -f or journalctl -u kubelet --no-pager to read specific error messages (e.g., certificate errors, swap enabled, misconfiguration). 5. Verify Configuration: Check the systemd unit file (often at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf) and the config file (usually /var/lib/kubelet/config.yaml). Ensure paths to certificates and binaries are correct. 6. Fix and Restart: After correcting the configuration or binary path, run systemctl daemon-reload and systemctl restart kubelet.
Exam Tips: Answering Questions on Node troubleshooting and kubelet issues 1. Always Check Status First: Do not jump into fixing things blindly. Run systemctl status kubelet immediately after SSH-ing into the node. If it is stopped, try starting it. 2. Read the Logs Carefully: The logs will tell you exactly why it failed. Common exam scenarios include: - Binary Path Mismatch: The systemd file points to /usr/bin/kubelet but the binary is in /usr/local/bin/kubelet. - Config Path Mismatch: The configuration file argument is pointing to a file that does not exist. - CA Certificate Errors: The kubelet cannot authenticate with the API server due to wrong CA paths. 3. Service Restart Sequence: If you change a configuration file, just restart the service. If you change the service unit file (systemd file), you must run systemctl daemon-reload before restarting the service. 4. Verify: After fixing, run systemctl status kubelet to ensure it is active (running), then exit the node and run kubectl get nodes on the control plane to ensure the node turns to Ready.