Selecting an Appropriate Throttling Strategy

This guide covers the architectural patterns and AWS-specific implementations for throttling, ensuring system stability and cost-efficiency under variable workloads.

Learning Objectives

Define the purpose of throttling in a distributed architecture.
Compare different throttling strategies across AWS services (DynamoDB, API Gateway, WAF).
Evaluate when to use Provisioned vs. On-Demand capacity modes.
Implement client-side strategies like exponential backoff and jitter to handle throttled requests.

Key Terms & Glossary

Throttling: The process of limiting the number of requests a service can handle within a specific time window to prevent resource exhaustion.
RCU / WCU: Read Capacity Units and Write Capacity Units; the measure of throughput for Amazon DynamoDB.
Rate-based Rule: A rule in AWS WAF that tracks the rate of requests from a single IP address and triggers an action if it exceeds a threshold.
Exponential Backoff: An algorithm that uses progressively longer waits between retries for consecutive error responses.
Jitter: Random noise added to backoff intervals to prevent "thundering herd" problems.
Burst Capacity: Temporary throughput allowed above the provisioned limit, often leveraging unused capacity from previous minutes.

The "Big Idea"

Throttling is not a failure of the system; it is a self-preservation mechanism. By intentionally rejecting requests that exceed a defined threshold, a system protects its core resources (database, compute, memory) from complete collapse during traffic spikes or DDoS attacks. In the AWS ecosystem, selecting the right strategy is a balancing act between Availability (serving every request) and Cost/Stability (protecting the backend).

Formula / Concept Box

Throttling Layer	Strategy	Best For
Edge (WAF)	Rate-based Rules	Preventing DDoS or brute-force from specific IPs.
API Layer	API Gateway Throttling	Managing multi-tenant access and protecting downstream Lambdas.
Data Layer	DynamoDB Provisioned	Predictable workloads with stable traffic patterns.
Data Layer	DynamoDB On-Demand	Highly unpredictable or spiky workloads.

Hierarchical Outline

Service-Level Throttling
- DynamoDB Throughput: Managing RCU/WCU limits.
- API Gateway Usage Plans: Assigning different rate limits to different API keys/clients.
- Lambda Concurrency: Limiting the number of simultaneous executions to protect downstream databases.
Network-Level Throttling (WAF)
- IP-based limiting: Blocking or rate-limiting specific malicious actors.
- Path-based limiting: Protecting high-resource endpoints (e.g., /search vs /home).
Client-Side Resilience
- SDK Default Behavior: AWS SDKs automatically retry throttled requests.
- Custom Backoff: Implementing specific wait times in non-SDK applications.

Visual Anchors

Request Flow and Throttling Layers

Loading Diagram...

Exponential Backoff Visualization

This TikZ diagram illustrates how wait times increase between retries to allow the system to recover.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Provisioned Capacity Mode: You pre-allocate a specific number of reads/writes per second.
- Example: A retail site expects 1,000 users per hour consistently and sets DynamoDB to 50 RCU to minimize costs while meeting demand.
Rate-Based Rules: AWS WAF rules that aggregate requests based on a property (like IP address).
- Example: Setting a limit of 1,000 requests per 5 minutes per IP to prevent a single bot from scraping the entire product catalog.
Adaptive Capacity: DynamoDB's ability to shift RCU/WCU between partitions to handle uneven data access.
- Example: During a flash sale, one specific product ID gets 90% of the traffic; DynamoDB moves capacity to that partition to prevent throttling.

Worked Examples

Example 1: Calculating RCU for Throttling Prevention

Scenario: You have an application that needs to read 10 items per second. Each item is 8 KB in size. You are using Strongly Consistent Reads.

Rule: 1 RCU = 4 KB for a strongly consistent read.
Size per item: 8 KB = 2 RCU per item.
Total Needed: 10 items/sec × 2 RCU/item = 20 RCU.
Strategy: If you provisioned only 10 RCU, 50% of your requests would be throttled with a ProvisionedThroughputExceededException.

Example 2: WAF Rate-Limit Configuration

Scenario: Protect a login endpoint from brute force attacks.

Strategy: Create a WAF Rate-based rule.
Threshold: 100 requests per 5 minutes.
Action: Block.
Result: If an attacker attempts 500 logins in 1 minute, the WAF will block that specific IP for the remainder of the window, protecting the Cognito/Identity pool from being overwhelmed.

Checkpoint Questions

Which HTTP status code is typically returned by Amazon API Gateway when a request is throttled due to a usage plan limit?
Why is "Jitter" added to an exponential backoff algorithm?
In DynamoDB, what is the main advantage of using Auto Scaling over fixed Provisioned Capacity?
If a client receives a ProvisionedThroughputExceededException from DynamoDB, what is the first recommended action at the application layer?

▶Click to see answers

429 Too Many Requests.
To prevent multiple clients from retrying at the exact same time (Thundering Herd), which could cause another immediate spike and subsequent throttling.
Auto Scaling automatically adjusts RCU/WCU based on actual consumption, reducing costs during low traffic and preventing throttling during high traffic.
Implement (or rely on) exponential backoff and retries.