High-Performing Systems Architectures: Elasticity, Fleets, and Placement Groups

This study guide covers the core components of performance-efficient architectures on AWS, focusing on how to design systems that automatically adapt to demand and optimize compute and network resources.

Learning Objectives

By the end of this module, you should be able to:

Differentiate between Dynamic and Predictive scaling strategies.
Design mixed-instance strategies using EC2 Instance Fleets to optimize cost and performance.
Select the appropriate Placement Group based on network latency and availability requirements.
Evaluate global service offerings (Global Accelerator, CloudFront) to reduce latency for distributed users.

Key Terms & Glossary

Auto Scaling Group (ASG): A logical collection of EC2 instances that are treated as a single unit for purposes of scaling and management.
Instance Fleet: A configuration for EC2 Fleet or Spot Fleet that allows you to specify multiple instance types and purchase options (Spot, On-Demand) to meet a target capacity.
Placement Group: A logical grouping of instances within a single Availability Zone (Cluster) or across multiple zones (Spread/Partition) to influence where instances are physically placed on hardware.
Vertical Scaling (Scaling Up): Increasing the specifications (CPU, RAM) of an individual resource.
Horizontal Scaling (Scaling Out): Adding more instances of a resource to distribute the load.

The "Big Idea"

Performance efficiency in the cloud is not a static state but a dynamic process. High-performing architectures move away from "guessing capacity" and instead leverage Automation and Specialization. By using managed scaling and purpose-built placement, you ensure that the infrastructure mirrors the demand curve exactly, minimizing waste while maximizing throughput and minimizing latency.

Formula / Concept Box

Scaling Metric	Suggested Threshold	Logic
Scale Out (High Load)	> 70-75% Utilization	Add capacity before the system reaches a saturation point.
Scale In (Low Load)	< 30% Utilization	Remove capacity to save costs when resources are idle.
Predictive Window	14 Days History	AWS analyzes 2 weeks of data to predict the next 48 hours.
Cluster Placement	10 Gbps / Low Latency	Best for tightly coupled node-to-node communication.

Hierarchical Outline

Elasticity & Auto Scaling
- Dynamic Scaling: Responds to real-time metrics (CPU, Memory, Request Count).
- Predictive Scaling: Uses Machine Learning to forecast future traffic and provision early.
- Serverless Scaling: Services like AWS Lambda and Amazon S3 scale natively without manual configuration.
Compute Optimization
- Instance Fleets: Combining On-Demand and Spot instances for cost-effective performance.
- Rightsizing: Selecting the correct instance family (Compute, Memory, or I/O optimized) for the specific workload.
Network & Placement Optimization
- Placement Groups: Manipulating physical placement for performance or durability.
- Global Services: Using CloudFront (Edge Caching) and Global Accelerator (Anycast IP/Network Path optimization).

Visual Anchors

Scaling Decision Flow

Loading Diagram...

Cluster vs. Spread Placement

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Predictive Scaling: Using historical patterns to prepare capacity ahead of time.
- Example: An e-commerce site knows traffic spikes every Monday at 9:00 AM; Predictive scaling spins up instances at 8:45 AM so they are warm before the surge.
Cluster Placement Group: A logical grouping of instances within a single Availability Zone.
- Example: A High-Performance Computing (HPC) cluster performing complex fluid dynamics simulations that require constant, low-latency node-to-node communication.
Instance Fleet: A mechanism to manage various instance types and purchase models.
- Example: A batch processing job that uses a mix of m5.large, c5.large, and r5.large Spot instances to process data at the lowest possible price.

Worked Examples

Scenario: Handling Flash Sales

Problem: A retailer has a 1-hour flash sale. Traffic jumps from 1,000 to 50,000 users in 5 minutes. Dynamic scaling is too slow to react, leading to 503 errors during the first 10 minutes.

Solution Strategy:

Analyze History: Use the 14-day history in AWS Auto Scaling to identify if this is a recurring pattern.
Enable Predictive Scaling: Configure the ASG to use predictive scaling to "buffer" the start of the sale.
Optimize Network: Use Amazon CloudFront to cache static assets (images/CSS) at the edge, reducing the load reaching the EC2 origin.
Result: Capacity is already provisioned when the sale starts, and edge caching absorbs 80% of the traffic.

Checkpoint Questions

Which placement group should you use for an application that requires low-latency, 10 Gbps network throughput? (Answer: Cluster)
What is the minimum amount of historical data required for AWS Predictive Scaling to begin making forecasts? (Answer: 14 Days)
True or False: Instance Fleets allow you to combine On-Demand and Spot instances in a single request. (Answer: True)
How does Global Accelerator differ from CloudFront? (Answer: Global Accelerator optimizes the network path using Anycast IPs for TCP/UDP, while CloudFront is a CDN focused on caching content.)

Muddy Points & Cross-Refs

Dynamic vs. Predictive: People often get confused about which to use. Rule of thumb: Use both. Predictive handles the known cycles, while Dynamic (Step/Target Tracking) handles the unexpected spikes.
Placement Group Limits: You cannot move an existing instance into a placement group. You must create the group and then launch instances into it, or create an AMI and relaunch.
Cross-Ref: For more on how to measure these impacts, see the CloudWatch Performance Metrics chapter.

Comparison Tables

Feature	Cluster Placement	Spread Placement	Partition Placement
Primary Goal	Low Network Latency	Maximize Reliability	High Availability for Large Clusters
AZ Coverage	Single AZ Only	Multiple AZs	Multiple AZs
Instance Limit	Restricted by AZ Capacity	7 Instances per AZ	Hundreds of instances
Use Case	HPC, Big Data, Video Encoding	Critical DB Nodes	HDFS, HBase, Cassandra