Lambda Concurrency Management
Why is Lambda Concurrency Management Important?
Lambda concurrency management is crucial for building scalable, reliable, and cost-effective serverless applications. Understanding how AWS Lambda handles concurrent executions allows you to prevent throttling, control costs, protect downstream resources from being overwhelmed, and ensure consistent application performance. For the AWS Developer Associate exam, this topic frequently appears in questions about performance optimization and troubleshooting.
What is Lambda Concurrency?
Concurrency in AWS Lambda refers to the number of function instances serving requests at any given time. When your function is invoked, Lambda allocates an instance to process the event. If the function is invoked again while the first request is still being processed, another instance is allocated, resulting in two concurrent executions.
There are three types of concurrency controls in Lambda:
1. Unreserved Concurrency - The pool of concurrency available to all functions in your account that have not been assigned reserved concurrency. By default, your account has a soft limit of 1,000 concurrent executions across all functions in a region.
2. Reserved Concurrency - Guarantees a maximum number of concurrent instances for a specific function. This concurrency is carved out from the account's total available concurrency. If you set reserved concurrency to 100, that function can never exceed 100 concurrent executions, but it also ensures those 100 are always available for that function.
3. Provisioned Concurrency - Pre-initializes a requested number of execution environments so they are prepared to respond to invocations. This eliminates cold starts and ensures consistent low-latency performance.
How Lambda Concurrency Works
When a request comes in, Lambda checks if there is an available warm instance. If not, it creates a new instance (cold start). The function processes the request and then waits for additional requests (warm instance). If no requests come within a timeout period, the instance is terminated.
Key Behaviors:
- When concurrency limit is reached, additional invocations are throttled
- Synchronous invocations return a 429 error when throttled
- Asynchronous invocations are retried automatically and then sent to a dead-letter queue if configured
- Event source mappings (like SQS, Kinesis) handle throttling by retrying
Burst Concurrency: Lambda provides an initial burst of concurrency in each region (between 500-3000 depending on region), after which concurrency scales at 500 instances per minute.
Configuring Concurrency
To set Reserved Concurrency:
- Navigate to your Lambda function in the console
- Go to Configuration > Concurrency
- Set the reserved concurrency value
- Setting to 0 effectively disables the function
To set Provisioned Concurrency:
- Configure on a specific function version or alias (not $LATEST)
- Incurs additional charges even when not in use
- Can use Application Auto Scaling to adjust based on utilization
Common Use Cases
- Reserved Concurrency: Protecting downstream databases from too many connections, limiting costs, isolating critical functions
- Provisioned Concurrency: Customer-facing APIs requiring consistent response times, functions with heavy initialization code, applications where cold starts are unacceptable
Exam Tips: Answering Questions on Lambda Concurrency Management
Key Points to Remember:
1. Throttling scenarios: When you see 429 errors or ThrottlingException, think concurrency limits. The solution often involves increasing reserved concurrency or requesting a service limit increase.
2. Cold start elimination: If a question mentions reducing latency for Lambda functions, provisioned concurrency is typically the answer. Remember it must be configured on a version or alias, not $LATEST.
3. Protecting downstream resources: When questions describe databases or APIs being overwhelmed by Lambda, reserved concurrency is the solution to cap the number of concurrent connections.
4. Reserved vs Provisioned: Reserved sets a maximum limit and reserves capacity. Provisioned keeps instances warm and ready. Know the difference - reserved is about limiting, provisioned is about performance.
5. Account-level limits: Remember the default 1,000 concurrent executions per region. If one function consumes all concurrency, other functions will be throttled.
6. Asynchronous behavior: Throttled async invocations go to an internal queue and retry for up to 6 hours before being sent to a DLQ.
7. Cost implications: Provisioned concurrency costs money even when idle. If a question emphasizes cost optimization, reserved concurrency or accepting some cold starts may be preferred.
8. Auto Scaling with Provisioned Concurrency: For variable but predictable traffic patterns, Application Auto Scaling can dynamically adjust provisioned concurrency based on schedule or utilization metrics.
Common Exam Scenarios:
- Application experiencing intermittent 429 errors → Increase reserved concurrency or account limits
- Need consistent sub-second response times → Use provisioned concurrency
- Database connection pool exhaustion → Set reserved concurrency to limit concurrent Lambda instances
- Critical function being starved by other functions → Configure reserved concurrency for the critical function