AWS Elastic Load Balancing: Scaling Factors and Performance Optimization
Scaling factors for load balancers
AWS Elastic Load Balancing: Scaling Factors and Performance Optimization
This study guide focuses on the critical scaling factors, performance metrics, and architectural considerations for AWS Elastic Load Balancers (ELB) as required for the Advanced Networking Specialty (ANS-C01) exam.
Learning Objectives
- Identify the specific CloudWatch metrics used as scaling triggers for different ELB types (ALB, NLB, GWLB).
- Contrast dynamic scaling, predictive scaling, and manual capacity management.
- Explain the integration between ELB target groups and EC2 Auto Scaling Groups (ASG).
- Analyze the impact of features like cross-zone load balancing and sticky sessions on scaling efficiency.
Key Terms & Glossary
- LCU (Load Balancer Capacity Unit): A relative metric used to measure the resources consumed by an ALB/NLB. Scaling and billing are often tied to the highest dimension of LCU (New connections, Active connections, LCRC, or Processed bytes).
- Target Group: A logical grouping of resources (Instances, IP addresses, Lambda functions) that receive traffic from the load balancer.
- Predictive Scaling: An Auto Scaling feature that uses Machine Learning to forecast future traffic based on historical patterns (daily/weekly) and provisions capacity in advance.
- Connection Draining (Deregistration Delay): The time a load balancer allows in-flight requests to complete after a target is marked for removal or becomes unhealthy.
- Sticky Sessions (Session Affinity): A mechanism that routes all requests from a single client to the same backend target for a specific duration.
The "Big Idea"
Scaling in AWS is not just about adding more servers; it is about the harmonious coordination between the entry point (the Load Balancer) and the compute fleet (Auto Scaling Group). The load balancer acts as the sensor, reporting metrics (like RequestCount or TargetResponseTime) to CloudWatch, which then triggers the ASG to adjust capacity. Mastering scaling means balancing high availability (never dropping a request) against cost-optimization (never over-provisioning).
Formula / Concept Box
| Concept | Metric / Rule | Application |
|---|---|---|
| Scaling Threshold | Dynamic scaling policy to handle sudden load. | |
| Cooldown Period | Prevents "flapping" where the ASG adds/removes too quickly. | |
| ALB Scaling Metric | TargetResponseTime | Best for latency-sensitive web applications (L7). |
| NLB Scaling Metric | ActiveFlowCount | Best for high-throughput, long-lived TCP/UDP connections (L4). |
Hierarchical Outline
- I. Scaling Mechanisms
- Dynamic Scaling: Responds to real-time CloudWatch alarms (e.g., CPU, Memory, Request Count).
- Predictive Scaling: Uses ML to "warm up" the fleet before expected spikes (e.g., 9:00 AM login rush).
- Scheduled Scaling: Manual time-based adjustments (e.g., scaling up for a known Black Friday event).
- II. ELB Scaling Factors
- L7 Factors (ALB): Request rate, latency, HTTP error rates, and header-based routing complexity.
- L4 Factors (NLB): New/Active connections per second, bandwidth (Gbps), and source IP affinity.
- III. Auto Scaling Integration
- Health Checks: Moving from EC2-status checks to ELB-level health checks for better accuracy.
- Target Tracking: Simplifying scaling by defining a "target value" (e.g., keep average CPU at 50%).
Visual Anchors
The Scaling Feedback Loop
Scaling Response Curve
\begin{tikzpicture}[scale=0.8] % Axes \draw[->] (0,0) -- (6,0) node[right] {Time}; \draw[->] (0,0) -- (0,5) node[above] {Capacity};
% Demand Curve (dashed)
\draw[dashed, blue, thick] (0.5,1) .. controls (2,1) and (3,4) .. (5,4.5) node[right] {Traffic Demand};
% Step Scaling Line
\draw[red, ultra thick] (0.5,1.2) -- (2.5,1.2) -- (2.5,2.5) -- (4,2.5) -- (4,4.7) -- (5.5,4.7) node[below] {Provisioned Instances};
% Legend
\draw[blue, dashed] (1,-1) -- (2,-1) node[right] {Traffic Load};
\draw[red] (4,-1) -- (5,-1) node[right] {Fleet Size};\end{tikzpicture}
Definition-Example Pairs
- Metric: RequestCountPerTarget
- Definition: The average number of requests completed by each target in a target group during a specified time interval.
- Example: If you want each web server to handle exactly 100 requests per second, you set a Target Tracking policy for this metric at 100. If total traffic is 500 RPS, the ASG maintains 5 instances.
- Metric: TargetResponseTime
- Definition: The time (in seconds) elapsed after the request leaves the load balancer until a response is received from the target.
- Example: In a payment processing app, if latency exceeds 2 seconds, the ELB triggers a scale-out to reduce the burden on individual overloaded nodes.
Worked Examples
Scenario 1: The "Flash Sale" Spike
Problem: A retail site expects a 10x traffic spike at midnight. They currently use Dynamic Scaling. Result: The Dynamic Scaling reacts too slowly because it takes 3-5 minutes for EC2 instances to boot and pass health checks. Requests are dropped. Solution: Implement Predictive Scaling to begin launching instances at 11:45 PM based on historical Friday night data, and ensure Cross-Zone Load Balancing is enabled to distribute traffic evenly across all AZs during the ramp-up.
Scenario 2: Sticky Session Bottleneck
Problem: A developer enables Sticky Sessions (Duration: 1 hour). One specific user starts a massive data upload. Result: Because of the sticky session, that user's traffic cannot be re-balanced to the new instances launched by the ASG. One server hits 100% CPU while others remain idle. Solution: Reduce the Cookie Duration or move session state to ElastiCache (Redis) to allow the Load Balancer to distribute requests freely across the scaling fleet.
Checkpoint Questions
- Which ELB type is best suited for scaling to millions of requests per second with ultra-low latency? (Answer: NLB)
- What is the difference between a "Cooldown" and a "Warm-up" period in Auto Scaling? (Answer: Cooldown happens after a scaling activity to prevent further changes; Warm-up is the time a new instance needs before contributing to metrics).
- True/False: If an ALB target group has instances in three AZs, but Cross-Zone load balancing is disabled, traffic is split 50/50 between AZs regardless of instance count in each. (Answer: True).
Muddy Points & Cross-Refs
- LCU vs. Instances: Remember that the ELB itself scales automatically (managed by AWS), but you are responsible for scaling the targets (EC2/ECS). You pay for the ELB's LCU scaling AND the target's compute scaling.
- Pre-Warming: For massive, immediate spikes (e.g., Super Bowl ad), AWS Support can "pre-warm" your load balancer. This is a common exam topic—standard scaling is gradual.
Comparison Tables
| Feature | Application Load Balancer (L7) | Network Load Balancer (L4) |
|---|---|---|
| Primary Scaling Metric | RequestCountPerTarget | ActiveFlowCount / ProcessedBytes |
| Protocols | HTTP, HTTPS, gRPC | TCP, UDP, TLS |
| Static IP Support | No (Uses DNS name) | Yes (Elastic IP per Subnet) |
| Scaling Speed | Fast, but has slight overhead for L7 | Ultra-Fast (optimized for spikes) |
| Sticky Sessions | Supported (App/Duration based) | Supported (Source IP based) |