AWS Scaling Methodologies: Load Balancing & Auto Scaling
Scaling methodologies (for example, load balancing, auto scaling)
AWS Scaling Methodologies: Load Balancing & Auto Scaling
This study guide covers the essential scaling methodologies required for the AWS Certified Solutions Architect - Professional (SAP-C02) exam, focusing on elasticity, automation, and cost-optimized performance.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between Vertical and Horizontal scaling.
- Configure Dynamic and Predictive scaling policies for Auto Scaling Groups (ASG).
- Design architectures that leverage Elastic Load Balancing (ELB) to distribute traffic.
- Select appropriate scaling mechanisms for serverless, containerized (EKS/ECS), and database workloads.
- Optimize scaling for cost efficiency using Spot Instances and Rightsizing.
Key Terms & Glossary
- Scaling Out / In: Adding or removing resource instances (Horizontal scaling).
- Scaling Up / Down: Increasing or decreasing the capacity (CPU/RAM) of an existing resource (Vertical scaling).
- Predictive Scaling: Uses Machine Learning to analyze historical traffic (min. 14 days) and schedule capacity changes 48 hours in advance.
- Target Tracking Policy: A scaling policy that keeps a specific metric (e.g., Average CPU Utilization) at a target value.
- Cooldown Period: A configurable time where Auto Scaling suspends scaling activities to allow the last action to take effect.
- Elasticity: The ability of a system to grow or shrink its resource consumption to match demand automatically.
The "Big Idea"
In traditional infrastructure, you over-provision to meet peak demand, leading to wasted costs. In the cloud, Elasticity allows you to match supply to demand in real-time. The goal is a "Goldilocks" architecture: never too much (wasted money) and never too little (poor performance). Scaling is not just about EC2; it encompasses every layer from the DNS (Route 53) to the database (Aurora/DynamoDB).
Formula / Concept Box
| Concept | Metric / Rule | Application |
|---|---|---|
| Scaling Threshold | Actual Metric > Threshold | Trigger a scale-out event via CloudWatch Alarms. |
| Predictive Window | 14 Days History → 2 Days Forecast | Requirement for AWS Predictive Scaling to function. |
| Availability Rule | N + 1 | Always maintain one more instance than required to handle a single AZ failure. |
| Unit of Scale | Small & Many > Large & Few | It is cheaper and faster to scale multiple small instances than one giant instance. |
Hierarchical Outline
- Horizontal vs. Vertical Scaling
- Vertical Scaling: Limited by hardware caps; requires downtime (restarting instance).
- Horizontal Scaling: Preferred for high availability; uses Load Balancers.
- AWS Auto Scaling Components
- Launch Templates: Defines what to launch (AMI, Instance Type, Security Groups).
- Auto Scaling Group (ASG): Defines where and how many (Min/Max/Desired capacity).
- Scaling Policies
- Target Tracking: Easiest to manage (e.g., "Keep CPU at 50%").
- Step Scaling: Responds to the magnitude of the alarm (e.g., "If CPU > 80%, add 3 instances").
- Predictive Scaling: Proactive adjustment for cyclic workloads.
- Service-Specific Scaling
- Serverless: S3, Lambda, and DynamoDB (On-Demand) scale automatically without management.
- Containers: EKS uses Karpenter (Node scaling) and HPA (Horizontal Pod Autoscaler).
- Databases: Aurora Auto Scaling adds Read Replicas; DynamoDB adjusts Read/Write Capacity Units (RCU/WCU).
Visual Anchors
The Scaling Loop
Vertical vs. Horizontal Scaling
\begin{tikzpicture} % Vertical Scaling \draw[thick] (0,0) rectangle (1.5,1.5) node[midway] {Small}; \draw[->, ultra thick] (0.75, 1.6) -- (0.75, 2.4) node[midway, right] {Scale Up}; \draw[thick] (0,2.5) rectangle (1.5,4.5) node[midway] {Large}; \node at (0.75, -0.5) {Vertical};
% Horizontal Scaling
\draw[thick] (4,0) rectangle (5.5,1.5) node[midway] {Instance};
\draw[->, ultra thick] (5.6, 0.75) -- (6.4, 0.75) node[midway, above] {Scale Out};
\draw[thick] (6.5,0) rectangle (8,1.5) node[midway] {Instance};
\draw[thick] (8.5,0) rectangle (10,1.5) node[midway] {Instance};
\node at (7, -0.5) {Horizontal};\end{tikzpicture}
Definition-Example Pairs
- Self-Healing: The ability of an ASG to replace unhealthy instances.
- Example: If an EC2 instance fails a status check, the ASG terminates it and launches a new one to maintain the "Desired Capacity."
- Loosely Coupled Dependencies: Using queues to buffer requests during scaling lag.
- Example: An SQS queue sits between a web front-end and a processing back-end so that if the back-end hasn't scaled out yet, messages are stored rather than lost.
- Rightsizing: Evaluating performance data to ensure instances aren't over-provisioned before scaling.
- Example: Using AWS Compute Optimizer to see that an m5.large is only using 10% CPU, then changing it to a t3.medium before setting up Auto Scaling.
Worked Examples
Scenario: Handling a predictable Monday morning traffic spike
Challenge: A company experiences a 400% increase in traffic every Monday at 9:00 AM. Dynamic scaling is too slow, causing latency for the first 15 minutes.
Solution Steps:
- Analyze: Historical CloudWatch data confirms the cyclic pattern.
- Implement: Enable Predictive Scaling on the EC2 Auto Scaling Group.
- Configure: Set the policy to "Forecast and Scale."
- Result: AWS looks at the last 14 days of data. On Sunday night, it schedules a capacity increase to be ready before 9:00 AM Monday.
- Safety Net: Keep a Target Tracking policy active alongside it to handle any unexpected deviations from the forecast.
Checkpoint Questions
- What is the minimum amount of historical data required for Predictive Scaling to function?
- Which scaling policy is best suited for maintaining a constant 40% memory utilization across a fleet?
- True or False: Vertical scaling is the best approach for achieving High Availability (HA).
- How does an Application Load Balancer (ALB) handle a newly launched instance from an ASG?
▶Click for Answers
- 14 days.
- Target Tracking Policy.
- False (Horizontal scaling allows for Multi-AZ distribution).
- The ASG automatically registers the new instance with the ALB Target Group once it passes health checks.
Muddy Points & Cross-Refs
- Warm-up vs. Cooldown: Beginners often confuse these. Warm-up is the time an instance needs to start serving traffic before its metrics are included in the group average. Cooldown is the "rest period" after a scaling action happens.
- Scaling for EKS: Standard ASG scaling isn't enough for Kubernetes because you have to scale both the Pods (HPA) and the Nodes (Karpenter/Cluster Autoscaler).
- Deep Dive: See "Task 3.5: Cost Optimization" in the SAP-C02 guide for more on using Spot Instances within ASGs.
Comparison Tables
Dynamic vs. Predictive Scaling
| Feature | Dynamic Scaling | Predictive Scaling |
|---|---|---|
| Trigger | Real-time CloudWatch Alarms | Historical ML Patterns |
| Reaction Time | Reactive (after load hits) | Proactive (before load hits) |
| Use Case | Random spikes/Unpredictable | Cyclic/Daily/Weekly patterns |
| Availability | All ASG-supported resources | EC2 ASGs only |
ELB Types for Scaling
| Balancer Type | Layer | Best Use Case |
|---|---|---|
| Application (ALB) | Layer 7 (HTTP/S) | Path-based routing, Microservices |
| Network (NLB) | Layer 4 (TCP/UDP) | Ultra-low latency, Static IPs |
| Gateway (GWLB) | Layer 3 (IP) | Third-party virtual appliances (Firewalls) |
[!TIP] For the exam, remember that Target Tracking is almost always the recommended dynamic scaling policy unless you have complex, multi-step scaling requirements.