AWS Auto Scaling Policies and Events: Master Study Guide
Auto scaling policies and events
AWS Auto Scaling Policies and Events
This guide explores the mechanisms provided by AWS to ensure workloads adapt to changing demand through automated scaling policies, covering EC2, ECS, serverless services, and Kubernetes environments.
Learning Objectives
After studying this guide, you should be able to:
- Distinguish between dynamic scaling and predictive scaling.
- Identify which AWS services support native Auto Scaling versus built-in serverless scaling.
- Configure effective scaling thresholds based on resource utilization metrics.
- Understand the specific scaling mechanisms for Kubernetes (EKS) including Karpenter and HPA/VPA.
- Apply 14-day historical data analysis for predictive capacity planning.
Key Terms & Glossary
- ASG (Auto Scaling Group): A logical grouping of EC2 instances for purposes of management and scaling.
- Scale-Out: The process of adding resources (e.g., instances) to handle increased demand.
- Scale-In: The process of removing resources to save costs when demand decreases.
- Target Tracking Policy: A scaling policy that increases or decreases capacity to maintain a specific metric at a target value (e.g., maintain average CPU at 50%).
- Step Scaling: A policy that scales capacity based on the size of the alarm breach (e.g., add 2 instances if CPU is 70%, but add 4 if it hits 90%).
- Predictive Scaling: A mechanism that uses machine learning to forecast future traffic and schedule capacity changes in advance.
The "Big Idea"
In cloud architecture, Elasticity is the goal. Scaling should not just be about "growing big," but about "matching the curve." By automating scaling events, organizations avoid over-provisioning (wasted money) and under-provisioning (dropped traffic), ensuring that the infrastructure footprint perfectly mirrors real-time user demand.
Formula / Concept Box
| Concept | Metric Requirement | Predictive Window |
|---|---|---|
| Dynamic Scaling | Real-time (CloudWatch) | Instantaneous response |
| Predictive Scaling | 14 days of history | Forecasts next 48 hours |
| Lambda Scaling | Concurrency Quotas | Automatic / Internal |
[!IMPORTANT] Predictive scaling is currently only available for Amazon EC2 Auto Scaling Groups.
Hierarchical Outline
- Scaling Methodologies
- Manual Scaling: Human intervention (rarely recommended for production).
- Dynamic Scaling: Reactionary; responds to CloudWatch alarms.
- Predictive Scaling: Proactive; uses ML for cyclic patterns.
- Service-Specific Scaling
- EC2/ECS: Uses Auto Scaling Groups and Service Auto Scaling.
- DynamoDB: Supports both On-Demand (serverless) and Provisioned (with Auto Scaling).
- EKS (Kubernetes):
- Cluster Level: Cluster Autoscaler or Karpenter.
- Pod Level: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).
- Serverless Scaling (Built-in)
- S3 & Lambda: Scale automatically without manual policy configuration.
- Quotas: Critical to monitor Service Quotas to prevent scaling caps.
Visual Anchors
Scaling Logic Flow
Demand vs. Capacity (TikZ)
\begin{tikzpicture}[scale=0.8] % Axes \draw[->] (0,0) -- (6,0) node[right] {Time}; \draw[->] (0,0) -- (0,4) node[above] {Capacity/Demand};
% Demand Curve (Sine-like)
\draw[thick, blue] plot [domain=0.4:5.6, samples=100] (\x, {2 + sin(\x*120)});
\node[blue] at (5.5, 3.2) {Demand};
% Capacity Steps (Dynamic)
\draw[thick, red, dashed] (0.4, 2.5) -- (2, 2.5) -- (2, 3.5) -- (3.5, 3.5) -- (3.5, 2.2) -- (5, 2.2) -- (5, 1.5);
\node[red] at (1.5, 3.8) {Provisioned Capacity};\end{tikzpicture}
Definition-Example Pairs
- Cooldown Period: A configurable time where the ASG waits for previous scaling actions to take effect before scaling again.
- Example: After launching an EC2 instance, waiting 300 seconds for the application to boot before checking if CPU is still high.
- Scale-In Protection: A setting that prevents specific instances from being terminated during a scale-in event.
- Example: Preventing the termination of an instance currently processing a long-running batch job even if the average CPU is low.
Worked Examples
Example 1: Calculating Scale-Out
Scenario: An ASG has a minimum of 2 instances and a maximum of 10. The policy is to add 50% more capacity when CPU exceeds 70%.
- Current State: 4 instances running.
- Event: CPU hits 85%.
- Calculation: $4 \times 0.50 = 2$ additional instances.
- Result: ASG scales out to 6 instances.
Example 2: Predictive Scaling Setup
Scenario: A retail site sees massive spikes every Monday at 9:00 AM.
- Analysis: AWS Auto Scaling monitors the site for 14 days.
- Forecast: It predicts the 9:00 AM spike for the upcoming Monday.
- Action: It starts launching instances at 8:45 AM so capacity is warm before the traffic arrives.
Checkpoint Questions
- What is the minimum amount of historical data required for Predictive Scaling to generate a forecast?
- Which tool is used in EKS to scale EC2 Nodes efficiently by bypassng the standard ASG overhead?
- If an application is "Serverless," do you still need to configure Auto Scaling policies?
- Why might you combine Predictive and Dynamic scaling policies?
Muddy Points & Cross-Refs
- EKS Scaling Confusion: Users often confuse HPA (scaling pods) with Karpenter (scaling nodes). Think of HPA as "buying more groceries" and Karpenter as "buying a bigger fridge."
- Serverless Limits: While Lambda scales automatically, it is subject to Account Concurrency Limits (usually 1,000 per region). Cross-reference with "Service Quotas" documentation.
- Cooldown vs. Warm-up: Cooldown is for the whole group; warm-up is for the individual instance being added.
Comparison Tables
Dynamic vs. Predictive Scaling
| Feature | Dynamic Scaling | Predictive Scaling |
|---|---|---|
| Mechanism | Reactive (Alarms) | Proactive (ML Forecast) |
| Data Source | Real-time CloudWatch Metrics | 14-day Historical Baseline |
| Best For | Unpredictable bursts | Cyclic, repeating patterns |
| Service Support | EC2, ECS, DynamoDB, Aurora | EC2 ASGs only |
EKS Scaling Tools
| Tool | Level | What it Scales |
|---|---|---|
| HPA | Pod | Horizontal count of Pods based on CPU/RAM |
| VPA | Pod | Vertical sizing (CPU/RAM) of existing Pods |
| Karpenter | Infrastructure | Provisions right-sized EC2 nodes directly |
| Cluster Autoscaler | Infrastructure | Adjusts ASG sizes to fit pending Pods |