Troubleshooting and Optimization
Perform root cause analysis, instrument code for observability, and optimize application performance (~18% of exam).
Troubleshooting and Optimization are critical skills for AWS Certified Developer - Associate certification, focusing on identifying issues and improving application performance within AWS environments. **Troubleshooting** involves systematically diagnosing and resolving problems in AWS application…
Concepts covered: X-Ray segments and subsegments, Distributed tracing, CloudWatch alarms and notifications, Amazon SNS for alerting, Quota limit notifications, Deployment completion notifications, AWS CloudTrail for API logging, Structured logging for applications, JSON logging format, Application health checks, Readiness probes, Liveness probes, Concurrency concepts, Lambda concurrency management, Reserved concurrency, Application performance profiling, Amazon CodeGuru Profiler, Determining optimal memory allocation, Compute power optimization, Lambda Power Tuning, SNS subscription filter policies, SQS message filtering, CloudFront cache behavior, Caching based on request headers, Application-level caching, Cache invalidation strategies, Resource usage optimization, Cold start optimization, Analyzing performance issues, Identifying performance bottlenecks, Debugging code defects, Interpreting application metrics, Interpreting application logs, Interpreting application traces, Amazon CloudWatch Logs Insights, Querying logs for relevant data, CloudWatch embedded metric format (EMF), Custom CloudWatch metrics, CloudWatch dashboards, CloudWatch Container Insights, CloudWatch Application Insights, Troubleshooting deployment failures, Service output logs analysis, Debugging service integration issues, Logging vs monitoring vs observability, Effective logging strategies, Log levels and log aggregation, Emitting custom metrics from code, AWS X-Ray tracing, X-Ray annotations and metadata
DVA-C02 - Troubleshooting and Optimization Example Questions
Test your knowledge of Troubleshooting and Optimization
Question 1
Which Kubernetes liveness probe type sends a request to a specified port to verify the container is accepting connections?
Question 2
A travel booking platform operates on AWS and uses a microservices architecture with multiple Lambda functions processing reservations. The DevOps team has configured CloudWatch alarms to monitor application health. They created an alarm on a custom metric called 'BookingErrors' with a threshold of 10 errors, using the Sum statistic with a 5-minute period and 1 evaluation period. The alarm is connected to an SNS topic with both email and SMS subscriptions. During a database connectivity issue, the application experienced errors but the alarm remained in OK state. Investigation revealed that the Lambda functions were publishing the BookingErrors metric using the PutMetricData API with a dimension of 'Environment=Production'. However, when the DevOps engineer created the alarm, they specified the metric name correctly but configured the alarm with a dimension of 'Env=Production'. The metric data shows clear spikes above the threshold during the incident timeframe. What is the primary reason the alarm failed to transition to ALARM state during this incident?
Question 3
A payment processing company has deployed an AWS Lambda function using .NET 8 runtime that validates credit card transactions. The function imports Entity Framework Core for data access, Newtonsoft.Json for serialization, and custom fraud detection modules, creating a 42 MB deployment package. The function operates within a VPC to communicate with an on-premises fraud detection system through AWS Direct Connect. Transaction logs reveal that merchants processing their first transaction after store opening (which varies by timezone across 15 countries) experience 13-16 second authorization delays, while transactions during active shopping periods complete in 750ms. The function is configured with 2560 MB memory. Network analysis confirms VPC Elastic Network Interface creation contributes 5-6 seconds, while .NET runtime initialization and dependency loading account for 7-8 seconds. The company processes 2,000 transactions per second during global peak hours but only 20-30 per minute during the quietest global period (around 4 AM UTC). Management requires all transaction authorizations to complete in under 3 seconds to meet payment network SLA requirements. Given the global merchant distribution with unpredictable store opening times, which optimization strategy would most effectively address the cold start latency requirements?