AWS Auto Scaling Policies and Events

This guide explores the mechanisms provided by AWS to ensure workloads adapt to changing demand through automated scaling policies, covering EC2, ECS, serverless services, and Kubernetes environments.

Learning Objectives

After studying this guide, you should be able to:

Distinguish between dynamic scaling and predictive scaling.
Identify which AWS services support native Auto Scaling versus built-in serverless scaling.
Configure effective scaling thresholds based on resource utilization metrics.
Understand the specific scaling mechanisms for Kubernetes (EKS) including Karpenter and HPA/VPA.
Apply 14-day historical data analysis for predictive capacity planning.

Key Terms & Glossary

ASG (Auto Scaling Group): A logical grouping of EC2 instances for purposes of management and scaling.
Scale-Out: The process of adding resources (e.g., instances) to handle increased demand.
Scale-In: The process of removing resources to save costs when demand decreases.
Target Tracking Policy: A scaling policy that increases or decreases capacity to maintain a specific metric at a target value (e.g., maintain average CPU at 50%).
Step Scaling: A policy that scales capacity based on the size of the alarm breach (e.g., add 2 instances if CPU is 70%, but add 4 if it hits 90%).
Predictive Scaling: A mechanism that uses machine learning to forecast future traffic and schedule capacity changes in advance.

The "Big Idea"

In cloud architecture, Elasticity is the goal. Scaling should not just be about "growing big," but about "matching the curve." By automating scaling events, organizations avoid over-provisioning (wasted money) and under-provisioning (dropped traffic), ensuring that the infrastructure footprint perfectly mirrors real-time user demand.

Formula / Concept Box

Concept	Metric Requirement	Predictive Window
Dynamic Scaling	Real-time (CloudWatch)	Instantaneous response
Predictive Scaling	14 days of history	Forecasts next 48 hours
Lambda Scaling	Concurrency Quotas	Automatic / Internal

[!IMPORTANT] Predictive scaling is currently only available for Amazon EC2 Auto Scaling Groups.

Hierarchical Outline

Scaling Methodologies
- Manual Scaling: Human intervention (rarely recommended for production).
- Dynamic Scaling: Reactionary; responds to CloudWatch alarms.
- Predictive Scaling: Proactive; uses ML for cyclic patterns.
Service-Specific Scaling
- EC2/ECS: Uses Auto Scaling Groups and Service Auto Scaling.
- DynamoDB: Supports both On-Demand (serverless) and Provisioned (with Auto Scaling).
- EKS (Kubernetes):
  - Cluster Level: Cluster Autoscaler or Karpenter.
  - Pod Level: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).
Serverless Scaling (Built-in)
- S3 & Lambda: Scale automatically without manual policy configuration.
- Quotas: Critical to monitor Service Quotas to prevent scaling caps.

Visual Anchors

Scaling Logic Flow

Loading Diagram...

Demand vs. Capacity (TikZ)

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Cooldown Period: A configurable time where the ASG waits for previous scaling actions to take effect before scaling again.
- Example: After launching an EC2 instance, waiting 300 seconds for the application to boot before checking if CPU is still high.
Scale-In Protection: A setting that prevents specific instances from being terminated during a scale-in event.
- Example: Preventing the termination of an instance currently processing a long-running batch job even if the average CPU is low.

Worked Examples

Example 1: Calculating Scale-Out

Scenario: An ASG has a minimum of 2 instances and a maximum of 10. The policy is to add 50% more capacity when CPU exceeds 70%.

Current State: 4 instances running.
Event: CPU hits 85%.
Calculation: $$4 \times 0.50 = 2$$ additional instances.
Result: ASG scales out to 6 instances.

Example 2: Predictive Scaling Setup

Scenario: A retail site sees massive spikes every Monday at 9:00 AM.

Analysis: AWS Auto Scaling monitors the site for 14 days.
Forecast: It predicts the 9:00 AM spike for the upcoming Monday.
Action: It starts launching instances at 8:45 AM so capacity is warm before the traffic arrives.

Checkpoint Questions

What is the minimum amount of historical data required for Predictive Scaling to generate a forecast?
Which tool is used in EKS to scale EC2 Nodes efficiently by bypassng the standard ASG overhead?
If an application is "Serverless," do you still need to configure Auto Scaling policies?
Why might you combine Predictive and Dynamic scaling policies?

Muddy Points & Cross-Refs

EKS Scaling Confusion: Users often confuse HPA (scaling pods) with Karpenter (scaling nodes). Think of HPA as "buying more groceries" and Karpenter as "buying a bigger fridge."
Serverless Limits: While Lambda scales automatically, it is subject to Account Concurrency Limits (usually 1,000 per region). Cross-reference with "Service Quotas" documentation.
Cooldown vs. Warm-up: Cooldown is for the whole group; warm-up is for the individual instance being added.

Comparison Tables

Dynamic vs. Predictive Scaling

Feature	Dynamic Scaling	Predictive Scaling
Mechanism	Reactive (Alarms)	Proactive (ML Forecast)
Data Source	Real-time CloudWatch Metrics	14-day Historical Baseline
Best For	Unpredictable bursts	Cyclic, repeating patterns
Service Support	EC2, ECS, DynamoDB, Aurora	EC2 ASGs only

EKS Scaling Tools

Tool	Level	What it Scales
HPA	Pod	Horizontal count of Pods based on CPU/RAM
VPA	Pod	Vertical sizing (CPU/RAM) of existing Pods
Karpenter	Infrastructure	Provisions right-sized EC2 nodes directly
Cluster Autoscaler	Infrastructure	Adjusts ASG sizes to fit pending Pods

AWS Auto Scaling Policies and Events

Learning Objectives

After studying this guide, you should be able to:

Distinguish between dynamic scaling and predictive scaling.
Identify which AWS services support native Auto Scaling versus built-in serverless scaling.
Configure effective scaling thresholds based on resource utilization metrics.
Understand the specific scaling mechanisms for Kubernetes (EKS) including Karpenter and HPA/VPA.
Apply 14-day historical data analysis for predictive capacity planning.

Key Terms & Glossary

ASG (Auto Scaling Group): A logical grouping of EC2 instances for purposes of management and scaling.
Scale-Out: The process of adding resources (e.g., instances) to handle increased demand.
Scale-In: The process of removing resources to save costs when demand decreases.
Target Tracking Policy: A scaling policy that increases or decreases capacity to maintain a specific metric at a target value (e.g., maintain average CPU at 50%).
Step Scaling: A policy that scales capacity based on the size of the alarm breach (e.g., add 2 instances if CPU is 70%, but add 4 if it hits 90%).
Predictive Scaling: A mechanism that uses machine learning to forecast future traffic and schedule capacity changes in advance.

The "Big Idea"

Formula / Concept Box

Concept	Metric Requirement	Predictive Window
Dynamic Scaling	Real-time (CloudWatch)	Instantaneous response
Predictive Scaling	14 days of history	Forecasts next 48 hours
Lambda Scaling	Concurrency Quotas	Automatic / Internal

[!IMPORTANT] Predictive scaling is currently only available for Amazon EC2 Auto Scaling Groups.

Hierarchical Outline

Scaling Methodologies
- Manual Scaling: Human intervention (rarely recommended for production).
- Dynamic Scaling: Reactionary; responds to CloudWatch alarms.
- Predictive Scaling: Proactive; uses ML for cyclic patterns.
Service-Specific Scaling
- EC2/ECS: Uses Auto Scaling Groups and Service Auto Scaling.
- DynamoDB: Supports both On-Demand (serverless) and Provisioned (with Auto Scaling).
- EKS (Kubernetes):
  - Cluster Level: Cluster Autoscaler or Karpenter.
  - Pod Level: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA).
Serverless Scaling (Built-in)
- S3 & Lambda: Scale automatically without manual policy configuration.
- Quotas: Critical to monitor Service Quotas to prevent scaling caps.

Visual Anchors

Scaling Logic Flow

Loading Diagram...

Demand vs. Capacity (TikZ)

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Cooldown Period: A configurable time where the ASG waits for previous scaling actions to take effect before scaling again.
- Example: After launching an EC2 instance, waiting 300 seconds for the application to boot before checking if CPU is still high.
Scale-In Protection: A setting that prevents specific instances from being terminated during a scale-in event.
- Example: Preventing the termination of an instance currently processing a long-running batch job even if the average CPU is low.

Worked Examples

Example 1: Calculating Scale-Out

Scenario: An ASG has a minimum of 2 instances and a maximum of 10. The policy is to add 50% more capacity when CPU exceeds 70%.

Current State: 4 instances running.
Event: CPU hits 85%.
Calculation: $$4 \times 0.50 = 2$$ additional instances.
Result: ASG scales out to 6 instances.

Example 2: Predictive Scaling Setup

Scenario: A retail site sees massive spikes every Monday at 9:00 AM.

Analysis: AWS Auto Scaling monitors the site for 14 days.
Forecast: It predicts the 9:00 AM spike for the upcoming Monday.
Action: It starts launching instances at 8:45 AM so capacity is warm before the traffic arrives.

Checkpoint Questions

What is the minimum amount of historical data required for Predictive Scaling to generate a forecast?
Which tool is used in EKS to scale EC2 Nodes efficiently by bypassng the standard ASG overhead?
If an application is "Serverless," do you still need to configure Auto Scaling policies?
Why might you combine Predictive and Dynamic scaling policies?

Muddy Points & Cross-Refs

EKS Scaling Confusion: Users often confuse HPA (scaling pods) with Karpenter (scaling nodes). Think of HPA as "buying more groceries" and Karpenter as "buying a bigger fridge."
Serverless Limits: While Lambda scales automatically, it is subject to Account Concurrency Limits (usually 1,000 per region). Cross-reference with "Service Quotas" documentation.
Cooldown vs. Warm-up: Cooldown is for the whole group; warm-up is for the individual instance being added.

Comparison Tables

Dynamic vs. Predictive Scaling

Feature	Dynamic Scaling	Predictive Scaling
Mechanism	Reactive (Alarms)	Proactive (ML Forecast)
Data Source	Real-time CloudWatch Metrics	14-day Historical Baseline
Best For	Unpredictable bursts	Cyclic, repeating patterns
Service Support	EC2, ECS, DynamoDB, Aurora	EC2 ASGs only

EKS Scaling Tools

Tool	Level	What it Scales
HPA	Pod	Horizontal count of Pods based on CPU/RAM
VPA	Pod	Vertical sizing (CPU/RAM) of existing Pods
Karpenter	Infrastructure	Provisions right-sized EC2 nodes directly
Cluster Autoscaler	Infrastructure	Adjusts ASG sizes to fit pending Pods

AWS Auto Scaling Policies and Events: Master Study Guide

AWS Auto Scaling Policies and Events

Learning Objectives

Key Terms & Glossary

The "Big Idea"

Formula / Concept Box

Hierarchical Outline

Visual Anchors

Scaling Logic Flow

Demand vs. Capacity (TikZ)

Definition-Example Pairs

Worked Examples

Example 1: Calculating Scale-Out

Example 2: Predictive Scaling Setup

Checkpoint Questions

Muddy Points & Cross-Refs

Comparison Tables

Dynamic vs. Predictive Scaling

EKS Scaling Tools

AWS Auto Scaling Policies and Events: Master Study Guide

AWS Auto Scaling Policies and Events

Learning Objectives

Key Terms & Glossary

The "Big Idea"

Formula / Concept Box

Hierarchical Outline

Visual Anchors

Scaling Logic Flow

Demand vs. Capacity (TikZ)

Definition-Example Pairs

Worked Examples

Example 1: Calculating Scale-Out

Example 2: Predictive Scaling Setup

Checkpoint Questions

Muddy Points & Cross-Refs

Comparison Tables

Dynamic vs. Predictive Scaling

EKS Scaling Tools