Mastering AWS Scalability: EC2 and AWS Auto Scaling

This study guide explores the mechanisms AWS provides to ensure applications remain responsive under varying loads while optimizing costs through elasticity.

Learning Objectives

After studying this guide, you should be able to:

Differentiate between Vertical Scaling and Horizontal Scaling.
Explain the core components of an Amazon EC2 Auto Scaling Group (ASG).
Identify the appropriate Scaling Policy for various business use cases (e.g., Target Tracking, Scheduled).
Understand how Health Checks from ELB and EC2 trigger instance replacement.
Describe the difference between Amazon EC2 Auto Scaling and the broader AWS Auto Scaling service.

Key Terms & Glossary

Scalability: The ability of a system to handle increased load by adding resources.
Elasticity: The ability to automatically scale resources up and down to match current demand.
Horizontal Scaling (Scaling Out/In): Adding more instances of a resource (e.g., adding more EC2 instances).
Vertical Scaling (Scaling Up/Down): Increasing the capacity of an existing resource (e.g., changing an EC2 instance type from t3.micro to t3.large).
Cooldown Period: A configurable setting for your Auto Scaling group that helps to ensure that it doesn't launch or terminate additional instances before the previous scaling activity takes effect.
Desired Capacity: The number of instances that the ASG attempts to maintain at all times.

The "Big Idea"

In traditional on-premises environments, you must "provision for the peak," leading to wasted resources during low-traffic periods. The Big Idea of AWS Scalability is Elasticity: treating infrastructure like a thermostat. Instead of manually turning servers on and off, you define a "desired state" (e.g., "Keep CPU at 60%"), and AWS handles the heavy lifting of provisioning and terminating resources in real-time. This ensures high availability while maintaining cost efficiency.

Formula / Concept Box

Concept	Description	Typical Use Case
Horizontal Scaling	Add/Remove instances	Web application fleets, distributed processing
Vertical Scaling	Increase/Decrease instance size	Databases (RDS), legacy monolithic apps
Target Tracking	Maintain a specific metric (e.g., 50% CPU)	Most general-purpose workloads
Scheduled Scaling	Scale based on known time patterns	Weekly reports, predictable marketing events
Dynamic Scaling	Scale based on real-time CloudWatch Alarms	Unpredictable traffic spikes

[!IMPORTANT] Amazon EC2 Auto Scaling only supports Horizontal Scaling. To scale vertically, you typically must stop the instance and change its type manually or via script.

Hierarchical Outline

Auto Scaling Group (ASG) Components
- Launch Template/Configuration: Defines what to launch (AMI, Instance Type, Key Pair, Security Groups).
- Group Settings: Defines where to launch (VPC, Subnets) and boundaries (Min, Max, Desired capacity).
- Scaling Policies: Defines when to launch (Metrics and Alarms).
Scaling Mechanisms
- Manual Scaling: Manually adjusting the "Desired Capacity."
- Scheduled Scaling: Predictable load changes (e.g., Friday at 5:00 PM).
- Dynamic Scaling:
  - Target Tracking: Simplest (e.g., "Keep Average CPU at 40%").
  - Step Scaling: Adjust based on the size of the alarm breach.
  - Simple Scaling: One-time adjustment followed by a cooldown.
High Availability Features
- Self-Healing: Automatic replacement of unhealthy instances.
- AZ Rebalancing: Attempting to keep an equal number of instances in each enabled Availability Zone.

Visual Anchors

Scaling Logic Flow

Loading Diagram...

ASG Capacity Boundaries

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Target Tracking Policy
- Definition: A policy that increases or decreases the current capacity of the group based on a target value for a specific metric.
- Example: A web service maintains an average CPU utilization of 50%. When a marketing campaign starts and CPU hits 70%, the ASG automatically adds instances until the average drops back to 50%.
Health Check Replacement
- Definition: The process where ASG terminates instances that fail EC2 status checks or ELB health checks and launches new ones.
- Example: An application on one EC2 instance crashes (Segmentation Fault). The ELB health check fails. ASG notices the failure, kills that specific instance, and starts a fresh one to maintain the "Desired Capacity."

Worked Examples

Scenario 1: The Predictable Spike

Problem: A news site experiences a 500% increase in traffic every Monday morning at 8:00 AM when the weekly newsletter is sent out. Dynamic scaling takes too long to initialize the instances.

Solution:

Analyze: Since the spike is predictable, use Scheduled Scaling.
Implementation: Create a Scheduled Action in the ASG to set the Min and Desired capacity to a higher value (e.g., 20 instances) every Monday at 7:45 AM.
Cleanup: Create a second Scheduled Action for Monday at 11:00 AM to return the capacity to the baseline (e.g., 2 instances).

Scenario 2: Maintaining User Experience

Problem: You want to ensure that no user ever experiences a latency of more than 200ms, but you don't know when traffic will arrive.

Solution:

Analyze: Use Target Tracking based on the ALBRequestCountPerTarget or a custom latency metric from CloudWatch.
Implementation: Set the target tracking policy to a value that historically correlates with 200ms latency. The ASG will then add instances whenever the request volume per instance gets too high.

Checkpoint Questions

What happens if an instance is marked 'Unhealthy' by an Application Load Balancer?
- Answer: If the ASG is configured to use ELB health checks, it will terminate the instance and launch a new one to maintain the desired capacity.
Which scaling policy is best for a workload that changes based on a specific, non-linear metric like 'Number of messages in an SQS queue'?
- Answer: Target Tracking (using the BacklogPerInstance metric) or Step Scaling.
True or False: An Auto Scaling group can span multiple AWS Regions.
- Answer: False. An ASG is regional and can span multiple Availability Zones within that region, but not multiple regions.
How does the 'Cooldown Period' prevent flapping?
- Answer: It prevents the ASG from launching or terminating more instances until the previous group of instances has had time to start up and begin handling traffic, preventing over-compensation.

--- study guide end ---

Mastering AWS Scalability: EC2 and AWS Auto Scaling

This study guide explores the mechanisms AWS provides to ensure applications remain responsive under varying loads while optimizing costs through elasticity.

Learning Objectives

After studying this guide, you should be able to:

Differentiate between Vertical Scaling and Horizontal Scaling.
Explain the core components of an Amazon EC2 Auto Scaling Group (ASG).
Identify the appropriate Scaling Policy for various business use cases (e.g., Target Tracking, Scheduled).
Understand how Health Checks from ELB and EC2 trigger instance replacement.
Describe the difference between Amazon EC2 Auto Scaling and the broader AWS Auto Scaling service.

Key Terms & Glossary

Scalability: The ability of a system to handle increased load by adding resources.
Elasticity: The ability to automatically scale resources up and down to match current demand.
Horizontal Scaling (Scaling Out/In): Adding more instances of a resource (e.g., adding more EC2 instances).
Vertical Scaling (Scaling Up/Down): Increasing the capacity of an existing resource (e.g., changing an EC2 instance type from t3.micro to t3.large).
Cooldown Period: A configurable setting for your Auto Scaling group that helps to ensure that it doesn't launch or terminate additional instances before the previous scaling activity takes effect.
Desired Capacity: The number of instances that the ASG attempts to maintain at all times.

The "Big Idea"

Formula / Concept Box

Concept	Description	Typical Use Case
Horizontal Scaling	Add/Remove instances	Web application fleets, distributed processing
Vertical Scaling	Increase/Decrease instance size	Databases (RDS), legacy monolithic apps
Target Tracking	Maintain a specific metric (e.g., 50% CPU)	Most general-purpose workloads
Scheduled Scaling	Scale based on known time patterns	Weekly reports, predictable marketing events
Dynamic Scaling	Scale based on real-time CloudWatch Alarms	Unpredictable traffic spikes

[!IMPORTANT] Amazon EC2 Auto Scaling only supports Horizontal Scaling. To scale vertically, you typically must stop the instance and change its type manually or via script.

Hierarchical Outline

Auto Scaling Group (ASG) Components
- Launch Template/Configuration: Defines what to launch (AMI, Instance Type, Key Pair, Security Groups).
- Group Settings: Defines where to launch (VPC, Subnets) and boundaries (Min, Max, Desired capacity).
- Scaling Policies: Defines when to launch (Metrics and Alarms).
Scaling Mechanisms
- Manual Scaling: Manually adjusting the "Desired Capacity."
- Scheduled Scaling: Predictable load changes (e.g., Friday at 5:00 PM).
- Dynamic Scaling:
  - Target Tracking: Simplest (e.g., "Keep Average CPU at 40%").
  - Step Scaling: Adjust based on the size of the alarm breach.
  - Simple Scaling: One-time adjustment followed by a cooldown.
High Availability Features
- Self-Healing: Automatic replacement of unhealthy instances.
- AZ Rebalancing: Attempting to keep an equal number of instances in each enabled Availability Zone.

Visual Anchors

Scaling Logic Flow

Loading Diagram...

ASG Capacity Boundaries

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Target Tracking Policy
- Definition: A policy that increases or decreases the current capacity of the group based on a target value for a specific metric.
- Example: A web service maintains an average CPU utilization of 50%. When a marketing campaign starts and CPU hits 70%, the ASG automatically adds instances until the average drops back to 50%.
Health Check Replacement
- Definition: The process where ASG terminates instances that fail EC2 status checks or ELB health checks and launches new ones.
- Example: An application on one EC2 instance crashes (Segmentation Fault). The ELB health check fails. ASG notices the failure, kills that specific instance, and starts a fresh one to maintain the "Desired Capacity."

Worked Examples

Scenario 1: The Predictable Spike

Problem: A news site experiences a 500% increase in traffic every Monday morning at 8:00 AM when the weekly newsletter is sent out. Dynamic scaling takes too long to initialize the instances.

Solution:

Analyze: Since the spike is predictable, use Scheduled Scaling.
Implementation: Create a Scheduled Action in the ASG to set the Min and Desired capacity to a higher value (e.g., 20 instances) every Monday at 7:45 AM.
Cleanup: Create a second Scheduled Action for Monday at 11:00 AM to return the capacity to the baseline (e.g., 2 instances).

Scenario 2: Maintaining User Experience

Problem: You want to ensure that no user ever experiences a latency of more than 200ms, but you don't know when traffic will arrive.

Solution:

Analyze: Use Target Tracking based on the ALBRequestCountPerTarget or a custom latency metric from CloudWatch.
Implementation: Set the target tracking policy to a value that historically correlates with 200ms latency. The ASG will then add instances whenever the request volume per instance gets too high.

Checkpoint Questions

What happens if an instance is marked 'Unhealthy' by an Application Load Balancer?
- Answer: If the ASG is configured to use ELB health checks, it will terminate the instance and launch a new one to maintain the desired capacity.
Which scaling policy is best for a workload that changes based on a specific, non-linear metric like 'Number of messages in an SQS queue'?
- Answer: Target Tracking (using the BacklogPerInstance metric) or Step Scaling.
True or False: An Auto Scaling group can span multiple AWS Regions.
- Answer: False. An ASG is regional and can span multiple Availability Zones within that region, but not multiple regions.
How does the 'Cooldown Period' prevent flapping?
- Answer: It prevents the ASG from launching or terminating more instances until the previous group of instances has had time to start up and begin handling traffic, preventing over-compensation.

--- study guide end ---