AWS Scaling Methodologies: Load Balancing & Auto Scaling

This study guide covers the essential scaling methodologies required for the AWS Certified Solutions Architect - Professional (SAP-C02) exam, focusing on elasticity, automation, and cost-optimized performance.

Learning Objectives

After studying this guide, you should be able to:

Differentiate between Vertical and Horizontal scaling.
Configure Dynamic and Predictive scaling policies for Auto Scaling Groups (ASG).
Design architectures that leverage Elastic Load Balancing (ELB) to distribute traffic.
Select appropriate scaling mechanisms for serverless, containerized (EKS/ECS), and database workloads.
Optimize scaling for cost efficiency using Spot Instances and Rightsizing.

Key Terms & Glossary

Scaling Out / In: Adding or removing resource instances (Horizontal scaling).
Scaling Up / Down: Increasing or decreasing the capacity (CPU/RAM) of an existing resource (Vertical scaling).
Predictive Scaling: Uses Machine Learning to analyze historical traffic (min. 14 days) and schedule capacity changes 48 hours in advance.
Target Tracking Policy: A scaling policy that keeps a specific metric (e.g., Average CPU Utilization) at a target value.
Cooldown Period: A configurable time where Auto Scaling suspends scaling activities to allow the last action to take effect.
Elasticity: The ability of a system to grow or shrink its resource consumption to match demand automatically.

The "Big Idea"

In traditional infrastructure, you over-provision to meet peak demand, leading to wasted costs. In the cloud, Elasticity allows you to match supply to demand in real-time. The goal is a "Goldilocks" architecture: never too much (wasted money) and never too little (poor performance). Scaling is not just about EC2; it encompasses every layer from the DNS (Route 53) to the database (Aurora/DynamoDB).

Formula / Concept Box

Concept	Metric / Rule	Application
Scaling Threshold	`Actual Metric > Threshold`	Trigger a scale-out event via CloudWatch Alarms.
Predictive Window	`14 Days History → 2 Days Forecast`	Requirement for AWS Predictive Scaling to function.
Availability Rule	`N + 1`	Always maintain one more instance than required to handle a single AZ failure.
Unit of Scale	`Small & Many > Large & Few`	It is cheaper and faster to scale multiple small instances than one giant instance.

Hierarchical Outline

Horizontal vs. Vertical Scaling
- Vertical Scaling: Limited by hardware caps; requires downtime (restarting instance).
- Horizontal Scaling: Preferred for high availability; uses Load Balancers.
AWS Auto Scaling Components
- Launch Templates: Defines what to launch (AMI, Instance Type, Security Groups).
- Auto Scaling Group (ASG): Defines where and how many (Min/Max/Desired capacity).
Scaling Policies
- Target Tracking: Easiest to manage (e.g., "Keep CPU at 50%").
- Step Scaling: Responds to the magnitude of the alarm (e.g., "If CPU > 80%, add 3 instances").
- Predictive Scaling: Proactive adjustment for cyclic workloads.
Service-Specific Scaling
- Serverless: S3, Lambda, and DynamoDB (On-Demand) scale automatically without management.
- Containers: EKS uses Karpenter (Node scaling) and HPA (Horizontal Pod Autoscaler).
- Databases: Aurora Auto Scaling adds Read Replicas; DynamoDB adjusts Read/Write Capacity Units (RCU/WCU).

Visual Anchors

The Scaling Loop

Loading Diagram...

Vertical vs. Horizontal Scaling

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Self-Healing: The ability of an ASG to replace unhealthy instances.
- Example: If an EC2 instance fails a status check, the ASG terminates it and launches a new one to maintain the "Desired Capacity."
Loosely Coupled Dependencies: Using queues to buffer requests during scaling lag.
- Example: An SQS queue sits between a web front-end and a processing back-end so that if the back-end hasn't scaled out yet, messages are stored rather than lost.
Rightsizing: Evaluating performance data to ensure instances aren't over-provisioned before scaling.
- Example: Using AWS Compute Optimizer to see that an m5.large is only using 10% CPU, then changing it to a t3.medium before setting up Auto Scaling.

Worked Examples

Scenario: Handling a predictable Monday morning traffic spike

Challenge: A company experiences a 400% increase in traffic every Monday at 9:00 AM. Dynamic scaling is too slow, causing latency for the first 15 minutes.

Solution Steps:

Analyze: Historical CloudWatch data confirms the cyclic pattern.
Implement: Enable Predictive Scaling on the EC2 Auto Scaling Group.
Configure: Set the policy to "Forecast and Scale."
Result: AWS looks at the last 14 days of data. On Sunday night, it schedules a capacity increase to be ready before 9:00 AM Monday.
Safety Net: Keep a Target Tracking policy active alongside it to handle any unexpected deviations from the forecast.

Checkpoint Questions

What is the minimum amount of historical data required for Predictive Scaling to function?
Which scaling policy is best suited for maintaining a constant 40% memory utilization across a fleet?
True or False: Vertical scaling is the best approach for achieving High Availability (HA).
How does an Application Load Balancer (ALB) handle a newly launched instance from an ASG?

▶Click for Answers

14 days.
Target Tracking Policy.
False (Horizontal scaling allows for Multi-AZ distribution).
The ASG automatically registers the new instance with the ALB Target Group once it passes health checks.

Muddy Points & Cross-Refs

Warm-up vs. Cooldown: Beginners often confuse these. Warm-up is the time an instance needs to start serving traffic before its metrics are included in the group average. Cooldown is the "rest period" after a scaling action happens.
Scaling for EKS: Standard ASG scaling isn't enough for Kubernetes because you have to scale both the Pods (HPA) and the Nodes (Karpenter/Cluster Autoscaler).
Deep Dive: See "Task 3.5: Cost Optimization" in the SAP-C02 guide for more on using Spot Instances within ASGs.

Comparison Tables

Dynamic vs. Predictive Scaling

Feature	Dynamic Scaling	Predictive Scaling
Trigger	Real-time CloudWatch Alarms	Historical ML Patterns
Reaction Time	Reactive (after load hits)	Proactive (before load hits)
Use Case	Random spikes/Unpredictable	Cyclic/Daily/Weekly patterns
Availability	All ASG-supported resources	EC2 ASGs only

ELB Types for Scaling

Balancer Type	Layer	Best Use Case
Application (ALB)	Layer 7 (HTTP/S)	Path-based routing, Microservices
Network (NLB)	Layer 4 (TCP/UDP)	Ultra-low latency, Static IPs
Gateway (GWLB)	Layer 3 (IP)	Third-party virtual appliances (Firewalls)

[!TIP] For the exam, remember that Target Tracking is almost always the recommended dynamic scaling policy unless you have complex, multi-step scaling requirements.