Selecting an Appropriate Throttling Strategy
Selecting an appropriate throttling strategy
Selecting an Appropriate Throttling Strategy
This guide covers the architectural patterns and AWS-specific implementations for throttling, ensuring system stability and cost-efficiency under variable workloads.
Learning Objectives
- Define the purpose of throttling in a distributed architecture.
- Compare different throttling strategies across AWS services (DynamoDB, API Gateway, WAF).
- Evaluate when to use Provisioned vs. On-Demand capacity modes.
- Implement client-side strategies like exponential backoff and jitter to handle throttled requests.
Key Terms & Glossary
- Throttling: The process of limiting the number of requests a service can handle within a specific time window to prevent resource exhaustion.
- RCU / WCU: Read Capacity Units and Write Capacity Units; the measure of throughput for Amazon DynamoDB.
- Rate-based Rule: A rule in AWS WAF that tracks the rate of requests from a single IP address and triggers an action if it exceeds a threshold.
- Exponential Backoff: An algorithm that uses progressively longer waits between retries for consecutive error responses.
- Jitter: Random noise added to backoff intervals to prevent "thundering herd" problems.
- Burst Capacity: Temporary throughput allowed above the provisioned limit, often leveraging unused capacity from previous minutes.
The "Big Idea"
Throttling is not a failure of the system; it is a self-preservation mechanism. By intentionally rejecting requests that exceed a defined threshold, a system protects its core resources (database, compute, memory) from complete collapse during traffic spikes or DDoS attacks. In the AWS ecosystem, selecting the right strategy is a balancing act between Availability (serving every request) and Cost/Stability (protecting the backend).
Formula / Concept Box
| Throttling Layer | Strategy | Best For |
|---|---|---|
| Edge (WAF) | Rate-based Rules | Preventing DDoS or brute-force from specific IPs. |
| API Layer | API Gateway Throttling | Managing multi-tenant access and protecting downstream Lambdas. |
| Data Layer | DynamoDB Provisioned | Predictable workloads with stable traffic patterns. |
| Data Layer | DynamoDB On-Demand | Highly unpredictable or spiky workloads. |
Hierarchical Outline
- Service-Level Throttling
- DynamoDB Throughput: Managing RCU/WCU limits.
- API Gateway Usage Plans: Assigning different rate limits to different API keys/clients.
- Lambda Concurrency: Limiting the number of simultaneous executions to protect downstream databases.
- Network-Level Throttling (WAF)
- IP-based limiting: Blocking or rate-limiting specific malicious actors.
- Path-based limiting: Protecting high-resource endpoints (e.g.,
/searchvs/home).
- Client-Side Resilience
- SDK Default Behavior: AWS SDKs automatically retry throttled requests.
- Custom Backoff: Implementing specific wait times in non-SDK applications.
Visual Anchors
Request Flow and Throttling Layers
Exponential Backoff Visualization
This TikZ diagram illustrates how wait times increase between retries to allow the system to recover.
Definition-Example Pairs
- Provisioned Capacity Mode: You pre-allocate a specific number of reads/writes per second.
- Example: A retail site expects 1,000 users per hour consistently and sets DynamoDB to 50 RCU to minimize costs while meeting demand.
- Rate-Based Rules: AWS WAF rules that aggregate requests based on a property (like IP address).
- Example: Setting a limit of 1,000 requests per 5 minutes per IP to prevent a single bot from scraping the entire product catalog.
- Adaptive Capacity: DynamoDB's ability to shift RCU/WCU between partitions to handle uneven data access.
- Example: During a flash sale, one specific product ID gets 90% of the traffic; DynamoDB moves capacity to that partition to prevent throttling.
Worked Examples
Example 1: Calculating RCU for Throttling Prevention
Scenario: You have an application that needs to read 10 items per second. Each item is 8 KB in size. You are using Strongly Consistent Reads.
- Rule: 1 RCU = 4 KB for a strongly consistent read.
- Size per item: 8 KB = 2 RCU per item.
- Total Needed: 10 items/sec × 2 RCU/item = 20 RCU.
- Strategy: If you provisioned only 10 RCU, 50% of your requests would be throttled with a
ProvisionedThroughputExceededException.
Example 2: WAF Rate-Limit Configuration
Scenario: Protect a login endpoint from brute force attacks.
- Strategy: Create a WAF Rate-based rule.
- Threshold: 100 requests per 5 minutes.
- Action: Block.
- Result: If an attacker attempts 500 logins in 1 minute, the WAF will block that specific IP for the remainder of the window, protecting the Cognito/Identity pool from being overwhelmed.
Checkpoint Questions
- Which HTTP status code is typically returned by Amazon API Gateway when a request is throttled due to a usage plan limit?
- Why is "Jitter" added to an exponential backoff algorithm?
- In DynamoDB, what is the main advantage of using Auto Scaling over fixed Provisioned Capacity?
- If a client receives a
ProvisionedThroughputExceededExceptionfrom DynamoDB, what is the first recommended action at the application layer?
▶Click to see answers
- 429 Too Many Requests.
- To prevent multiple clients from retrying at the exact same time (Thundering Herd), which could cause another immediate spike and subsequent throttling.
- Auto Scaling automatically adjusts RCU/WCU based on actual consumption, reducing costs during low traffic and preventing throttling during high traffic.
- Implement (or rely on) exponential backoff and retries.