Mastering Service Quotas and Throttling for High Availability
Service quotas and throttling (for example, how to configure the service quotas for a workload in a standby environment)
Mastering Service Quotas and Throttling for High Availability
This study guide focuses on the critical administrative and operational limits within AWS that impact high availability and disaster recovery, specifically focusing on service quotas and throttling mechanisms in standby environments.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between Service Quotas (limits) and Throttling (rate limiting).
- Explain the difference between Soft Limits and Hard Limits.
- Design a quota management strategy for Standby Environments to ensure successful failover.
- Implement Exponential Backoff and jitter to handle throttling exceptions.
- Utilize the Service Quotas Dashboard for proactive monitoring.
Key Terms & Glossary
- Service Quotas: Regional limits on the number of resources or operations available for an AWS account (formerly called "service limits").
- Throttling: The process of limiting the number of API requests a user can perform within a specific time period to protect service performance.
- Soft Limit: A quota that can be increased by submitting a request to AWS Support or through the Service Quotas console.
- Hard Limit: A fixed quota that cannot be increased (e.g., the number of subnets per VPC).
- Burst Capacity: The ability to temporarily exceed a steady-state rate limit for a short duration.
- Standby Environment: A secondary infrastructure setup (Pilot Light or Warm Standby) used for disaster recovery.
The "Big Idea"
[!IMPORTANT] High availability is not just about having redundant servers; it is about having the administrative headroom to launch them. In a disaster recovery scenario, if your primary region fails and you attempt to spin up 500 instances in your standby region without having the necessary service quotas pre-approved, your recovery will fail immediately. Quotas must be synchronized across regions.
Formula / Concept Box
| Concept | Type | Scope | Adjustability |
|---|---|---|---|
| Service Quotas | Resource Count | Regional | Soft (Changeable) / Hard (Fixed) |
| API Throttling | Request Rate | Regional/Account | Usually Fixed per Service |
| Default Quota | Initial Value | Standard | Applied to all new accounts |
Common Error Code:
429 Too Many Requests or LimitExceededException indicates you have hit a service quota or a throttling limit.
Hierarchical Outline
- Understanding Service Quotas
- Resource Quotas: Limits on "things" (e.g., number of VPCs, IGWs, EC2 Instances).
- API Rate Quotas: Limits on "actions" (e.g.,
RunInstancescalls per second). - Default vs. Applied: AWS starts accounts with defaults; you must request increases to reach "Applied Quotas."
- Throttling & API Management
- Token Bucket Algorithm: How AWS measures request rates.
- Steady State vs. Burst: Managing baseline traffic vs. sudden spikes.
- Client-Side Handling: Implementing retry logic with Exponential Backoff.
- Quotas in Standby Environments
- Regional Isolation: Quotas are regional. Increasing a limit in
us-east-1does not increase it inus-west-2. - Pre-warming Quotas: The practice of requesting limit increases in the DR region before a disaster occurs.
- Automation: Using AWS Config or CloudWatch Alarms to monitor quota usage.
- Regional Isolation: Quotas are regional. Increasing a limit in
Visual Anchors
Failover Quota Check Flow
API Throttling: Token Bucket Concept
\begin{tikzpicture}[scale=0.8] \draw[thick] (0,4) -- (0,0) -- (3,0) -- (3,4); \node at (1.5, -0.5) {Token Bucket}; \draw[fill=blue!20] (0.2, 0.2) rectangle (2.8, 0.8); \draw[fill=blue!20] (0.2, 1.0) rectangle (2.8, 1.6); \draw[fill=blue!20] (0.2, 1.8) rectangle (2.8, 2.4); \draw[->, thick] (1.5, 5) -- (1.5, 4.2) node[midway, right] {Refill Rate (Steady State)}; \draw[->, thick] (3.5, 1.5) -- (4.5, 1.5) node[right] {API Request (Consumes Token)}; \node at (1.5, 3.5) {Available Burst}; \end{tikzpicture}
Definition-Example Pairs
-
Term: Exponential Backoff
- Definition: A standard error handling strategy for network applications where the client progressively waits longer between retries of a failed request.
- Example: If an S3
PUTrequest is throttled, the SDK waits 100ms, then 200ms, then 400ms before trying again, reducing the pressure on the API.
-
Term: Quota Drift
- Definition: When the resource requirements in a primary environment increase over time, but the limits in the standby environment are not updated to match.
- Example: You increase your EC2 limit to 100 in Production (Region A) but forget to update the DR (Region B) which is still capped at 20. During a disaster, 80% of your fleet won't launch.
Worked Examples
Scenario: Multi-Region Web App
Problem: A company runs a fleet of 50 m5.large instances in us-east-1. They use a Warm Standby strategy in us-west-2 where they normally run only 2 instances. During a failover, they need to scale to 50 instances in us-west-2 within 10 minutes.
Step-by-Step Breakdown:
- Audit Current Quotas: Check
Service Quotasconsole in both regions. - Identify Gap: Primary has a limit of 100; Standby has a default limit of 20 for m5.large.
- The Fix: Submit a quota increase request for
us-west-2to match or exceed the primary region's expected peak (e.g., set Standby to 100). - Verification: Once approved, use an AWS SDK script to describe quotas in both regions periodically to ensure parity.
Checkpoint Questions
- Why is it insufficient to simply have the same CloudFormation templates in both regions for Disaster Recovery?
- What is the specific HTTP status code returned when an AWS API request is throttled?
- True or False: Service Quotas are global and once increased, apply to all AWS Regions.
- How does adding "Jitter" to an exponential backoff algorithm help service recovery?
- Which AWS service allows you to view and manage your limits from a central dashboard?
▶Click to view answers
- Because CloudFormation will fail to provision resources if the target region's Service Quotas are lower than the template's requirements.
- HTTP 429 (Too Many Requests).
- False. Service Quotas are Regional.
- Jitter prevents "thundering herd" problems where all clients retry at the exact same millisecond, potentially crashing the service again.
- AWS Service Quotas.