AWS Scaling Strategies: Metrics, Policies, and Conditions

This guide explores how to identify the right metrics and configure conditions for AWS Auto Scaling to ensure high availability and cost-efficiency.

Learning Objectives

Identify standard and custom CloudWatch metrics suitable for triggering scaling actions.
Differentiate between Simple, Step, and Target Tracking scaling policies.
Configure scaling adjustments using specific types such as ChangeInCapacity and PercentChangeInCapacity.
Understand the roles of cooldown periods and warm-up times in stabilizing environments.

Key Terms & Glossary

Auto Scaling Group (ASG): A collection of EC2 instances treated as a logical unit for purposes of automatic scaling and management.
CloudWatch Alarm: A mechanism that watches a single metric over a specified time period and performs actions based on the value of the metric relative to a threshold.
Horizontal Scaling: Adding or removing instances from a resource pool (scaling out/in).
Vertical Scaling: Increasing or decreasing the power (CPU, RAM) of an existing instance (scaling up/down).
Cooldown Period: A configurable setting that ensures the ASG does not launch or terminate additional instances before the previous scaling activity takes effect.

The "Big Idea"

At its core, Dynamic Scaling is the automated realization of the "Cloud Elasticity" promise. Instead of over-provisioning for peak load (which wastes money) or under-provisioning (which causes downtime), AWS allows you to create a feedback loop. By monitoring Service-Level Indicators (SLIs) via CloudWatch, the system makes real-time adjustments to your infrastructure supply to match the current consumer demand.

Formula / Concept Box

Adjustment Type	Description	Example (Current Capacity = 4)
ChangeInCapacity	Adds/removes a specific number of instances.	Add 2 $\rightarrow$ Result: 6
ExactCapacity	Sets the group to a specific absolute value.	Set to 10 $\rightarrow$ Result: 10
PercentChangeInCapacity	Adjusts capacity based on a percentage of current size.	Add 50% $\rightarrow$ Result: 6

[!IMPORTANT] When using PercentChangeInCapacity, AWS rounds the increment to the nearest integer. If the calculation results in a value less than 1, it will increment by 1 to ensure progress is made.

Hierarchical Outline

I. Monitoring the Workload
- Standard Metrics: CPU Utilization, Network In/Out, Disk I/O.
- Custom Metrics: Memory utilization (requires an agent) or application-specific logs (e.g., "Processing Time").
- Collection Interval: Standard (5 mins) vs. Detailed (1 min) monitoring.
II. Scaling Policies
- Simple Scaling: Single adjustment based on a single alarm breaching a threshold.
- Step Scaling: Multiple adjustments based on the magnitude of the alarm breach.
- Target Tracking: Automatically maintains a metric at a specific target (e.g., "Keep CPU at 60%").
III. Lifecycle Parameters
- Cooldowns: Prevents "thrashing" (rapid, unnecessary scaling actions) in Simple Scaling.
- Warm-up: Used in Step/Target Tracking to ignore metrics from instances not yet fully initialized.

Visual Anchors

Scaling Feedback Loop

Loading Diagram...

Step Scaling Visualization

This diagram represents how Step Scaling responds more aggressively as the breach magnitude increases.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Target Tracking Policy
- Definition: A policy that adjusts capacity to maintain a specific metric value.
- Example: Setting an ASG to maintain an Average CPU Utilization of 50%. AWS handles the creation of the underlying alarms automatically.
Custom Metric Filters
- Definition: Using CloudWatch Logs to extract data points for scaling.
- Example: An application logs "Order Processing Time." A filter extracts this as a metric; if the time exceeds 2 seconds, the ASG scales out to distribute the load.
Predictive Scaling
- Definition: Using machine learning to forecast demand and scale before a breach occurs.
- Example: A retail site that always sees a traffic spike at 8:00 AM every Monday. The system learns this pattern and launches instances at 7:45 AM.

Worked Examples

Example 1: Percent Change Calculation

Scenario: An ASG currently has 8 instances. A scaling policy is triggered with a PercentChangeInCapacity of +25%.

Current Capacity: 8
Adjustment: $$8 \times 0.25 = 2$$
New Capacity: $8 + 2 = 10$ instances.

Example 2: Step Scaling Configuration

Scenario: You need a graduated response for an e-commerce site.

Threshold: 60% CPU.
Step 1: Lower Bound = 0, Upper Bound = 10 (i.e., 60-70% CPU). Action: +1 instance.
Step 2: Lower Bound = 10, Upper Bound = $\infty$ (i.e., >70% CPU). Action: +3 instances.
Result: If CPU hits 75%, the ASG immediately adds 3 instances rather than waiting for multiple Simple Scaling cycles.

Checkpoint Questions

What is the main difference between a Cooldown and a Warm-up period?
Which scaling policy is recommended for most use cases because it is the easiest to set up and manage?
If your application is limited by RAM rather than CPU, how do you trigger a scaling action?
True or False: Simple Scaling policies can have multiple step adjustments.

▶Click to see answers

Cooldown (Simple Scaling) stops new scaling actions for a set time after one finishes. Warm-up (Step/Target Tracking) allows new instances to finish booting before their data is counted in the group's aggregate metrics.
Target Tracking is the simplest and usually most effective for standard metrics.
You must install the CloudWatch Agent on the EC2 instances to report Memory utilization as a Custom Metric, then link that metric to a CloudWatch Alarm.
False. Only Step Scaling policies support multiple adjustments based on the range of the breach.

[!TIP] For exam questions, if the workload is "erratic" or "spiky," look for Step Scaling. If the workload is "stable" or "predictable," look for Target Tracking or Scheduled Scaling.