Integrating AWS Auto Scaling with Elastic Load Balancing

This guide explores the architectural synergy between Elastic Load Balancing (ELB) and Auto Scaling Groups (ASG). Understanding how these services interact is critical for the AWS Advanced Networking Specialty exam, specifically regarding high availability, cost optimization, and performance management.

Learning Objectives

After studying this guide, you should be able to:

Explain the mechanism by which ELB and ASG coordinate to handle varying traffic loads.
Differentiate between Dynamic and Predictive scaling policies.
Identify the role of CloudWatch Alarms in triggering scaling events.
Configure Target Groups to ensure traffic is only routed to healthy, scaled-in instances.
Optimize for either Performance or Cost within a scaling plan.

Key Terms & Glossary

Auto Scaling Group (ASG): A collection of EC2 instances treated as a logical grouping for the purposes of automatic scaling and management.
Desired Capacity: The number of instances the ASG attempts to maintain at all times.
Target Group: A logical grouping of targets (EC2, Lambda, IP) that receive traffic from a load balancer.
CloudWatch Alarm: A mechanism that watches a single metric over a specified time period and performs actions based on the value of the metric relative to a threshold.
Predictive Scaling: A scaling policy that uses machine learning to predict future traffic patterns and provision capacity in advance.

The "Big Idea"

The "Big Idea" is Elastic Decoupling. By placing an ELB in front of an ASG, the infrastructure abstracts the physical servers from the end user. The ELB provides a single stable DNS entry, while the ASG manages the "churn" of adding or removing instances behind the scenes. This ensures that the application remains available regardless of individual instance failures or massive spikes in traffic.

Formula / Concept Box

Scaling Type	Trigger Mechanism	Primary Use Case
Dynamic	CloudWatch Alarms (e.g., CPU > 70%)	Reactive response to real-time traffic changes.
Predictive	Machine Learning (Daily/Weekly patterns)	Proactive scaling for anticipated spikes.
Scheduled	Time-based (e.g., 9:00 AM Monday)	Known, recurring events (e.g., business hours start).

[!IMPORTANT] Health Check Rule: The ELB health check is the source of truth for traffic routing, while the ASG health check determines instance replacement. Always ensure these are synchronized to prevent "zombie" instances.

Hierarchical Outline

I. Core Integration Components
- Elastic Load Balancer (ELB): Acts as the entry point; performs health checks.
- Target Groups: Links the ASG to the ELB; manages traffic distribution.
- Launch Templates: Define "what" to scale (AMI, Instance Type).
II. Scaling Policy Mechanics
- Predictive Scaling: Uses ML algorithms to analyze historical patterns; adjusts capacity before the load hits.
- Dynamic Scaling: Real-time adjustments based on metric thresholds (CPU, Connection Count).
III. Management & Optimization
- Minimum/Maximum Capacity: Safety rails to ensure redundancy and control costs.
- Automatic Discovery: Centralized resource discovery to identify scaling candidates.

Visual Anchors

Traffic and Scaling Flow

Loading Diagram...

Scaling Threshold Logic

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Cooldown Period: A configurable time where the ASG stays at its current capacity after a scaling activity to allow the new instances to stabilize.
- Example: After adding 2 instances due to a CPU spike, the ASG waits 300 seconds before evaluating another scale-out to prevent over-provisioning.
Cross-Zone Load Balancing: A feature where the ELB distributes traffic evenly across all registered instances in all enabled Availability Zones (AZs).
- Example: If AZ-A has 2 instances and AZ-B has 8, cross-zone balancing ensures each of the 10 instances receives 10% of the traffic, rather than 50% going to the small group in AZ-A.

Worked Examples

Scenario: The Flash Sale

Problem: A retail site expects a 500% traffic increase at midnight for a 1-hour sale. Dynamic scaling takes 5 minutes to spin up new instances, which is too slow.

Solution:

Scheduled Scaling: Set a rule to increase the Desired Capacity to 20 instances at 11:45 PM.
Predictive Scaling: Enable predictive scaling with a 24-hour look-back period. AWS ML will identify the pattern and prepare capacity for subsequent sale days.
Dynamic Scaling: Keep a CPU-based policy (e.g., 70% threshold) as a safety net if the 20 instances are still insufficient.

Checkpoint Questions

What happens if an instance fails an ELB health check but passes the ASG health check?
Which scaling policy uses Machine Learning to adjust capacity based on weekly load patterns?
True or False: Auto Scaling is a paid feature of the ELB service.
What is the benefit of setting a 'Maximum' capacity in an ASG?

▶Click to see answers

The ELB stops sending traffic to the instance, but the ASG may not replace it immediately unless specifically configured to use ELB health checks for replacement.
Predictive Scaling.
False. Auto Scaling itself is free; you only pay for the underlying EC2 resources and CloudWatch alarms.
It acts as a cost control mechanism, preventing runaway scaling during a DDoS attack or software bug.

Muddy Points & Cross-Refs

ELB Scaling vs. ASG Scaling: This is a common point of confusion. The ELB service itself scales automatically (managed by AWS) to handle more connections. ASG scales your backend instances. You only manage the ASG scaling policies.
Connection Draining: Also known as "Deregistration Delay." This ensures that when an instance is being scaled-in (removed), the ELB allows existing requests to complete before severing the connection.

Comparison Tables

Internal vs. External Load Balancers

Feature	External (Internet-facing)	Internal
DNS Resolution	Public IP address	Private IP address
Listener IP	Public	Private
Typical Target	Web Servers / Front-end	Database / Application Tier
Security Group	Open to 0.0.0.0/0 (Port 80/443)	Open to Front-end Security Group