Mastering Service Quotas and Throttling for High Availability

This study guide focuses on the critical administrative and operational limits within AWS that impact high availability and disaster recovery, specifically focusing on service quotas and throttling mechanisms in standby environments.

Learning Objectives

After studying this guide, you should be able to:

Differentiate between Service Quotas (limits) and Throttling (rate limiting).
Explain the difference between Soft Limits and Hard Limits.
Design a quota management strategy for Standby Environments to ensure successful failover.
Implement Exponential Backoff and jitter to handle throttling exceptions.
Utilize the Service Quotas Dashboard for proactive monitoring.

Key Terms & Glossary

Service Quotas: Regional limits on the number of resources or operations available for an AWS account (formerly called "service limits").
Throttling: The process of limiting the number of API requests a user can perform within a specific time period to protect service performance.
Soft Limit: A quota that can be increased by submitting a request to AWS Support or through the Service Quotas console.
Hard Limit: A fixed quota that cannot be increased (e.g., the number of subnets per VPC).
Burst Capacity: The ability to temporarily exceed a steady-state rate limit for a short duration.
Standby Environment: A secondary infrastructure setup (Pilot Light or Warm Standby) used for disaster recovery.

The "Big Idea"

[!IMPORTANT] High availability is not just about having redundant servers; it is about having the administrative headroom to launch them. In a disaster recovery scenario, if your primary region fails and you attempt to spin up 500 instances in your standby region without having the necessary service quotas pre-approved, your recovery will fail immediately. Quotas must be synchronized across regions.

Formula / Concept Box

Concept	Type	Scope	Adjustability
Service Quotas	Resource Count	Regional	Soft (Changeable) / Hard (Fixed)
API Throttling	Request Rate	Regional/Account	Usually Fixed per Service
Default Quota	Initial Value	Standard	Applied to all new accounts

Common Error Code: 429 Too Many Requests or LimitExceededException indicates you have hit a service quota or a throttling limit.

Hierarchical Outline

Understanding Service Quotas
- Resource Quotas: Limits on "things" (e.g., number of VPCs, IGWs, EC2 Instances).
- API Rate Quotas: Limits on "actions" (e.g., RunInstances calls per second).
- Default vs. Applied: AWS starts accounts with defaults; you must request increases to reach "Applied Quotas."
Throttling & API Management
- Token Bucket Algorithm: How AWS measures request rates.
- Steady State vs. Burst: Managing baseline traffic vs. sudden spikes.
- Client-Side Handling: Implementing retry logic with Exponential Backoff.
Quotas in Standby Environments
- Regional Isolation: Quotas are regional. Increasing a limit in us-east-1 does not increase it in us-west-2.
- Pre-warming Quotas: The practice of requesting limit increases in the DR region before a disaster occurs.
- Automation: Using AWS Config or CloudWatch Alarms to monitor quota usage.

Visual Anchors

Failover Quota Check Flow

Loading Diagram...

API Throttling: Token Bucket Concept

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Term: Exponential Backoff
- Definition: A standard error handling strategy for network applications where the client progressively waits longer between retries of a failed request.
- Example: If an S3 PUT request is throttled, the SDK waits 100ms, then 200ms, then 400ms before trying again, reducing the pressure on the API.
Term: Quota Drift
- Definition: When the resource requirements in a primary environment increase over time, but the limits in the standby environment are not updated to match.
- Example: You increase your EC2 limit to 100 in Production (Region A) but forget to update the DR (Region B) which is still capped at 20. During a disaster, 80% of your fleet won't launch.

Worked Examples

Scenario: Multi-Region Web App

Problem: A company runs a fleet of 50 m5.large instances in us-east-1. They use a Warm Standby strategy in us-west-2 where they normally run only 2 instances. During a failover, they need to scale to 50 instances in us-west-2 within 10 minutes.

Step-by-Step Breakdown:

Audit Current Quotas: Check Service Quotas console in both regions.
Identify Gap: Primary has a limit of 100; Standby has a default limit of 20 for m5.large.
The Fix: Submit a quota increase request for us-west-2 to match or exceed the primary region's expected peak (e.g., set Standby to 100).
Verification: Once approved, use an AWS SDK script to describe quotas in both regions periodically to ensure parity.

Checkpoint Questions

Why is it insufficient to simply have the same CloudFormation templates in both regions for Disaster Recovery?
What is the specific HTTP status code returned when an AWS API request is throttled?
True or False: Service Quotas are global and once increased, apply to all AWS Regions.
How does adding "Jitter" to an exponential backoff algorithm help service recovery?
Which AWS service allows you to view and manage your limits from a central dashboard?

▶Click to view answers

Because CloudFormation will fail to provision resources if the target region's Service Quotas are lower than the template's requirements.
HTTP 429 (Too Many Requests).
False. Service Quotas are Regional.
Jitter prevents "thundering herd" problems where all clients retry at the exact same millisecond, potentially crashing the service again.
AWS Service Quotas.