Troubleshooting

Diagnose and resolve cluster, node, and application issues in Kubernetes environments (30% of exam).

5 minutes 5 Questions

Troubleshooting is a significant domain in the Certified Kubernetes Administrator (CKA) exam, accounting for approximately 30% of the curriculum. It requires a hierarchical approach to diagnosis, moving from the Application layer to the Worker Nodes, and finally to the Control Plane. At the **Appl…

Concepts covered

Manage and evaluate container output streams Troubleshoot services and networking Debugging pods and containers Analyzing control plane logs Node troubleshooting and kubelet issues Resource quota and limit troubleshooting kubectl debug and ephemeral containers etcd troubleshooting and backup Certificate and authentication issues Troubleshoot clusters and nodes Troubleshoot cluster components Monitor cluster and application resource usage

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

CKA - Troubleshooting Example Questions

Test your knowledge of Troubleshooting

Question 1

A Kubernetes administrator wants to determine which pods are using the most CPU resources in the 'development' namespace to optimize workload distribution. The metrics-server has been running for over an hour. What kubectl command would retrieve the CPU and memory metrics for pods in that specific namespace?

kubectl top pods -n development kubectl top pods --namespace=development --sort-by=cpu kubectl describe pods -n development --show-metrics kubectl get pods -n development -o wide --resources

Correct Answer: kubectl top pods -n development

The kubectl top pods command with the namespace flag is the correct way to retrieve CPU and memory metrics for pods, as it queries the metrics-server API to display current resource consumption.

The describe command does not have a show-metrics flag and displays pod specifications rather than live metrics. The get pods command has no resources flag for retrieving metrics. While sort-by=cpu would seem logical, the correct flag is sort-by=memory or sort-by=cpu without the equals sign format, and this option adds unnecessary complexity for the basic requirement.

This tests understanding of kubectl top for monitoring resource usage via metrics-server.

Question 2

A Pod named 'client-app' in namespace 'frontend' is attempting to connect to a Service named 'data-api' in namespace 'backend'. The Service is of type ClusterIP with port 443 and targetPort 8443. Running 'kubectl exec client-app -n frontend -- curl -k https://data-api.backend.svc.cluster.local:443' returns 'connection refused'. However, 'kubectl get endpoints data-api -n backend' shows three healthy endpoints. Running 'kubectl exec client-app -n frontend -- curl -k https://10.244.1.50:8443' (one of the endpoint IPs) successfully returns a response. kube-proxy logs show no errors and iptables rules appear correct. What should you investigate to resolve this Service connectivity issue?

Investigate if the ClusterIP has been released and reassigned due to Service recreation causing stale iptables rules in the kube-proxy chain Examine whether the Service port definition uses TCP protocol while the backend Pods are configured to listen on UDP for the specified port Verify that the Service port definition specifies the correct protocol and that the targetPort matches the container port in the Pod spec Check if the Service has been configured with externalTrafficPolicy set to Local which restricts traffic routing to node-local endpoints only

Correct Answer: Verify that the Service port definition specifies the correct protocol and that the targetPort matches the container port in the Pod spec

The correct answer is to verify that the Service port definition specifies the correct protocol and that the targetPort matches the container port in the Pod spec.

Let's analyze the scenario step by step:

The curl command to the Service DNS name (data-api.backend.svc.cluster.local:443) returns 'connection refused'
However, curling the endpoint IP on port 8443 works successfully
The endpoints are healthy and kube-proxy/iptables appear correct

This behavior strongly suggests a port mapping or protocol mismatch in the Service definition. The key clue is that connecting to the endpoint IP on port 8443 works, but the Service on port 443 fails. This indicates the Service is likely not properly routing traffic from port 443 to targetPort 8443, possibly due to:
- A protocol mismatch (e.g., Service configured for HTTP but pods expecting HTTPS)
- The targetPort in the Service spec not matching what's actually defined
- The port configuration being incorrect in some way

The other options are incorrect for the following reasons:

The externalTrafficPolicy option is wrong because externalTrafficPolicy only applies to LoadBalancer and NodePort Services, not ClusterIP Services. Since this is a ClusterIP Service, this setting would have no effect.

The TCP/UDP protocol mismatch option is less likely because the error is 'connection refused' not 'connection timed out'. A TCP/UDP mismatch would typically result in no response at all (timeout), not an active refusal. Additionally, HTTPS connections use TCP, so if the curl to the endpoint IP works, the pods are clearly listening on TCP.

The stale iptables rules option is incorrect because the question states that iptables rules appear correct and kube-proxy logs show no errors. If the ClusterIP had been reassigned with stale rules, this would show up in iptables inspection or cause different symptoms like routing to wrong endpoints.

Question 3

You are troubleshooting an etcd backup operation that consistently produces a snapshot file of 0 bytes. The etcdctl command completes with exit code 0 and no error messages are displayed. The etcd cluster is healthy with all three members responding properly. Certificate files are readable and endpoint connectivity is confirmed. What is the most probable cause of this empty snapshot file?

The snapshot is being created from a follower node that requires leader redirection to complete the operation The snapshot command is missing the required --endpoints flag which defaults to an empty data source The snapshot is being written to a directory where the user lacks write permissions, causing silent failure The snapshot process is interrupted by etcd compaction running concurrently on the cluster during backup execution

Correct Answer: The snapshot is being written to a directory where the user lacks write permissions, causing silent failure

The most probable cause of a 0-byte snapshot file when etcdctl completes successfully with exit code 0 is a permission issue on the target directory.

When the user running the etcdctl snapshot save command lacks write permissions on the destination directory, the command may create an empty file (0 bytes) but fail to write the actual snapshot data. This behavior can occur because:

The file creation succeeds (creating an empty file)
The actual write operation fails silently
etcdctl may still report success (exit code 0) in certain scenarios where the command execution completes but the data write fails

This is a common gotcha in etcd backup operations, especially when running backups as different users or in containerized environments where volume mount permissions may not match the executing user.

The other options are less likely:

Running a snapshot from a follower node would not produce a 0-byte file. etcdctl handles follower-to-leader communication internally, and if there were an issue, it would typically result in an error message rather than an empty file.
Missing the --endpoints flag would not cause this behavior. etcdctl defaults to localhost:2379 when no endpoint is specified, and if the endpoint were unreachable, the command would fail with an error rather than produce an empty file. The question also states endpoint connectivity is confirmed.
Concurrent etcd compaction does not interfere with snapshot operations in a way that produces 0-byte files. etcd snapshots are atomic operations and compaction running simultaneously would not cause empty snapshots - at worst, it might cause a brief delay or a specific error message.

Unlock Premium Access

Certified Kubernetes Administrator

Access to ALL Certifications: Study for any certification on our platform with one subscription
1797 Superior-grade Certified Kubernetes Administrator practice questions
Unlimited practice tests across all certifications
Detailed explanations for every question
CKA: 5 full exams plus all other certification exams
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

Start Your Free 7-Day Trial