Mastering AWS Scalability: EC2 and AWS Auto Scaling
Scalability capabilities with appropriate use cases (for example, Amazon EC2 Auto Scaling, AWS Auto Scaling)
Mastering AWS Scalability: EC2 and AWS Auto Scaling
This study guide explores the mechanisms AWS provides to ensure applications remain responsive under varying loads while optimizing costs through elasticity.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between Vertical Scaling and Horizontal Scaling.
- Explain the core components of an Amazon EC2 Auto Scaling Group (ASG).
- Identify the appropriate Scaling Policy for various business use cases (e.g., Target Tracking, Scheduled).
- Understand how Health Checks from ELB and EC2 trigger instance replacement.
- Describe the difference between Amazon EC2 Auto Scaling and the broader AWS Auto Scaling service.
Key Terms & Glossary
- Scalability: The ability of a system to handle increased load by adding resources.
- Elasticity: The ability to automatically scale resources up and down to match current demand.
- Horizontal Scaling (Scaling Out/In): Adding more instances of a resource (e.g., adding more EC2 instances).
- Vertical Scaling (Scaling Up/Down): Increasing the capacity of an existing resource (e.g., changing an EC2 instance type from
t3.microtot3.large). - Cooldown Period: A configurable setting for your Auto Scaling group that helps to ensure that it doesn't launch or terminate additional instances before the previous scaling activity takes effect.
- Desired Capacity: The number of instances that the ASG attempts to maintain at all times.
The "Big Idea"
In traditional on-premises environments, you must "provision for the peak," leading to wasted resources during low-traffic periods. The Big Idea of AWS Scalability is Elasticity: treating infrastructure like a thermostat. Instead of manually turning servers on and off, you define a "desired state" (e.g., "Keep CPU at 60%"), and AWS handles the heavy lifting of provisioning and terminating resources in real-time. This ensures high availability while maintaining cost efficiency.
Formula / Concept Box
| Concept | Description | Typical Use Case |
|---|---|---|
| Horizontal Scaling | Add/Remove instances | Web application fleets, distributed processing |
| Vertical Scaling | Increase/Decrease instance size | Databases (RDS), legacy monolithic apps |
| Target Tracking | Maintain a specific metric (e.g., 50% CPU) | Most general-purpose workloads |
| Scheduled Scaling | Scale based on known time patterns | Weekly reports, predictable marketing events |
| Dynamic Scaling | Scale based on real-time CloudWatch Alarms | Unpredictable traffic spikes |
[!IMPORTANT] Amazon EC2 Auto Scaling only supports Horizontal Scaling. To scale vertically, you typically must stop the instance and change its type manually or via script.
Hierarchical Outline
- Auto Scaling Group (ASG) Components
- Launch Template/Configuration: Defines what to launch (AMI, Instance Type, Key Pair, Security Groups).
- Group Settings: Defines where to launch (VPC, Subnets) and boundaries (Min, Max, Desired capacity).
- Scaling Policies: Defines when to launch (Metrics and Alarms).
- Scaling Mechanisms
- Manual Scaling: Manually adjusting the "Desired Capacity."
- Scheduled Scaling: Predictable load changes (e.g., Friday at 5:00 PM).
- Dynamic Scaling:
- Target Tracking: Simplest (e.g., "Keep Average CPU at 40%").
- Step Scaling: Adjust based on the size of the alarm breach.
- Simple Scaling: One-time adjustment followed by a cooldown.
- High Availability Features
- Self-Healing: Automatic replacement of unhealthy instances.
- AZ Rebalancing: Attempting to keep an equal number of instances in each enabled Availability Zone.
Visual Anchors
Scaling Logic Flow
ASG Capacity Boundaries
Definition-Example Pairs
- Target Tracking Policy
- Definition: A policy that increases or decreases the current capacity of the group based on a target value for a specific metric.
- Example: A web service maintains an average CPU utilization of 50%. When a marketing campaign starts and CPU hits 70%, the ASG automatically adds instances until the average drops back to 50%.
- Health Check Replacement
- Definition: The process where ASG terminates instances that fail EC2 status checks or ELB health checks and launches new ones.
- Example: An application on one EC2 instance crashes (Segmentation Fault). The ELB health check fails. ASG notices the failure, kills that specific instance, and starts a fresh one to maintain the "Desired Capacity."
Worked Examples
Scenario 1: The Predictable Spike
Problem: A news site experiences a 500% increase in traffic every Monday morning at 8:00 AM when the weekly newsletter is sent out. Dynamic scaling takes too long to initialize the instances.
Solution:
- Analyze: Since the spike is predictable, use Scheduled Scaling.
- Implementation: Create a Scheduled Action in the ASG to set the
MinandDesiredcapacity to a higher value (e.g., 20 instances) every Monday at 7:45 AM. - Cleanup: Create a second Scheduled Action for Monday at 11:00 AM to return the capacity to the baseline (e.g., 2 instances).
Scenario 2: Maintaining User Experience
Problem: You want to ensure that no user ever experiences a latency of more than 200ms, but you don't know when traffic will arrive.
Solution:
- Analyze: Use Target Tracking based on the
ALBRequestCountPerTargetor a custom latency metric from CloudWatch. - Implementation: Set the target tracking policy to a value that historically correlates with 200ms latency. The ASG will then add instances whenever the request volume per instance gets too high.
Checkpoint Questions
- What happens if an instance is marked 'Unhealthy' by an Application Load Balancer?
- Answer: If the ASG is configured to use ELB health checks, it will terminate the instance and launch a new one to maintain the desired capacity.
- Which scaling policy is best for a workload that changes based on a specific, non-linear metric like 'Number of messages in an SQS queue'?
- Answer: Target Tracking (using the
BacklogPerInstancemetric) or Step Scaling.
- Answer: Target Tracking (using the
- True or False: An Auto Scaling group can span multiple AWS Regions.
- Answer: False. An ASG is regional and can span multiple Availability Zones within that region, but not multiple regions.
- How does the 'Cooldown Period' prevent flapping?
- Answer: It prevents the ASG from launching or terminating more instances until the previous group of instances has had time to start up and begin handling traffic, preventing over-compensation.
--- study guide end ---