AWS Lambda Concurrency: Optimization and Scaling Guide

Learning Objectives

After studying this guide, you will be able to:

Define concurrency in the context of AWS Lambda.
Distinguish between Reserved Concurrency and Provisioned Concurrency.
Calculate required concurrency for a function based on request volume and duration.
Identify the causes of throttling and how to mitigate them.

Key Terms & Glossary

Concurrency: The number of simultaneous requests that a Lambda function is serving at any given time.
Reserved Concurrency: A setting that guarantees a maximum number of concurrent instances for a specific function, also acting as a quota to prevent it from exhausting the account's total pool.
Provisioned Concurrency: A setting that initializes a requested number of execution environments so they are prepared to respond immediately to function invocations, eliminating cold starts.
Cold Start: The latency experienced when Lambda must initialize a new execution environment (download code, start runtime) before processing a request.
Throttling: The process by which AWS rejects incoming requests (Error 429) when concurrency limits are exceeded.

The "Big Idea"

Concurrency is the width of your application's pipe. While timeout and memory control how deep or powerful a single execution is, concurrency determines how many of those executions can happen at the exact same moment. Mastering this ensures your application scales to meet demand without accidentally starving other functions or crashing due to unexpected spikes.

Formula / Concept Box

Concept	Formula / Rule
Concurrency Calculation	$Concurrency = (Average Requests Per Second) \times (Average Execution Duration in Seconds)$
Unreserved Pool	$Account Limit - \sum(Reserved Concurrency of all functions)$
Default Limit	Most new accounts start with a regional limit of 1,000 concurrent executions.

Hierarchical Outline

I. Core Mechanics of Concurrency
- Execution Environments: Single instances of your code.
- Request Handling: One environment processes one request at a time.
II. Concurrency Controls
- Reserved Concurrency: Dedicated capacity + Upper limit.
- Provisioned Concurrency: Pre-warmed capacity for low-latency needs.
III. Scaling and Limits
- Burst Limits: Regional limits on how fast concurrency can ramp up.
- Throttling Behavior: How synchronous vs. asynchronous invocations handle 429 errors.

Visual Anchors

Scaling Logic Flow

Loading Diagram...

Provisioned Concurrency vs. Demand

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Reserved Concurrency: Reserving a specific slice of the account limit for a function.
- Example: An "Order Processing" function is assigned 100 Reserved Concurrency. It can never use more than 100, but it is guaranteed to have up to 100 available even if other functions are busy.
Burst Limit: The immediate increase in concurrency allowed before steady scaling kicks in.
- Example: In US East (N. Virginia), if traffic suddenly spikes, Lambda can immediately jump to an additional 3,000 instances before scaling at a rate of 500 per minute.

Worked Examples

Problem: Calculating Concurrency Needs

A Lambda function receives an average of 500 requests per second. Each request takes an average of 200 milliseconds to process. What is the average concurrency needed?

Step-by-Step Solution:

Identify Variables:
- $R$ (Requests/sec) = 500
- $D$ (Duration in seconds) = 0.2 (since 200ms = 0.2s)
Apply Formula: $Concurrency = R \times D$
Calculate: $$500 \times 0.2 = 100$$
Conclusion: The function requires a concurrency of 100 to handle the steady-state load.

Checkpoint Questions

Q: What is the primary difference between Reserved and Provisioned concurrency?
- A: Reserved sets a maximum limit and guarantees capacity; Provisioned keeps environments "warm" to eliminate cold starts.
Q: If a function has no Reserved Concurrency, where does it get its capacity?
- A: It draws from the account's unreserved concurrency pool.
Q: What HTTP status code does Lambda return when a function is throttled?
- A: Status Code 429 (Too Many Requests).
Q: Does increasing Lambda memory directly increase concurrency limits?
- A: No. Memory increases CPU/RAM for a single instance, but Concurrency is a separate setting for total simultaneous instances.

[!IMPORTANT] Always leave at least 100 unreserved concurrency in your account. If you reserve all 1,000 (default limit), functions without specific reservations will be unable to execute.