Cloud Scalability: Horizontal vs. Vertical Scaling
Horizontal scaling and vertical scaling
Cloud Scalability: Horizontal vs. Vertical Scaling
This guide explores the fundamental strategies for handling workload growth in the AWS cloud environment, focusing on the differences between adding resources to existing instances versus adding more instances in parallel.
Learning Objectives
- Define Horizontal Scaling (Scaling Out) and Vertical Scaling (Scaling Up).
- Identify specific AWS services and features used for each scaling type.
- Compare the trade-offs regarding downtime, cost, and high availability.
- Determine the appropriate scaling strategy based on specific application bottlenecks (CPU, RAM, Network).
Key Terms & Glossary
- Scaling Out (Horizontal): Adding more instances to a system to share the workload (e.g., adding more web servers).
- Scaling Up (Vertical): Increasing the capacity of a single resource (e.g., upgrading an EC2 instance to a larger size).
- Elasticity: The ability of a system to automatically scale resources up or down in response to real-time demand.
- Read Replica: A copy of a database instance used specifically for read-only queries to offload the primary database.
- Sharding/Partitioning: A method of horizontal scaling for databases where data is split across multiple independent instances.
- AMI (Amazon Machine Image): A template that contains a software configuration (OS, application server, and applications) used to launch horizontal copies of an instance.
The "Big Idea"
In traditional on-premises environments, "Scaling Up" was the norm because buying a bigger server was easier than networking many small ones. In the cloud, Horizontal Scaling is the "Gold Standard." It enables Elasticity—allowing systems to grow and shrink automatically—and High Availability, ensuring that the failure of one instance doesn't bring down the entire application.
Formula / Concept Box
| Feature | Vertical Scaling (Scale Up) | Horizontal Scaling (Scale Out) |
|---|---|---|
| Action | Change instance type (e.g., t3.micro → m5.large) | Add more instances (e.g., 1 → 5 instances) |
| Downtime | Typically required (Stop/Start) | Zero downtime (using Load Balancers) |
| Limit | Hard ceiling (max instance size) | Virtually limitless |
| Use Case | Monolithic apps, database RAM upgrades | Web tiers, distributed processing, microservices |
| AWS Service | EC2 Instance Type change, RDS Instance Class | Auto Scaling Groups (ASG), RDS Read Replicas |
Hierarchical Outline
- Vertical Scaling (Scaling Up)
- Mechanism: Modifying resource attributes (CPU, RAM, I/O).
- AWS Implementation: Stopping an EC2 instance and selecting a larger instance class.
- Pros: Simple to manage; no architectural changes required.
- Cons: Single point of failure; limited by hardware maximums; requires downtime.
- Horizontal Scaling (Scaling Out)
- Mechanism: Adding identical resources in parallel.
- AWS Implementation: Using Auto Scaling Groups and Elastic Load Balancing (ELB).
- Pros: High availability; no downtime; granular cost control through elasticity.
- Cons: Requires stateless application design; more complex networking.
- Scaling in Database Services (RDS)
- Vertical: Changing the DB Instance Class.
- Horizontal: Creating Read Replicas for read-heavy workloads or Sharding for write-heavy workloads.
Visual Anchors
Scaling Logic Decision Flow
Comparison of Resource Distribution
Definition-Example Pairs
- Scale Up (Vertical): Increasing the resources of a single server.
- Example: An RDS database running on a
db.t3.mediuminstance experiences 100% CPU utilization. The administrator changes the instance type todb.r5.largeto handle the query load.
- Example: An RDS database running on a
- Scale Out (Horizontal): Increasing the number of servers.
- Example: A web application receives a spike in traffic due to a sale. AWS Auto Scaling detects the load and launches four additional EC2 instances to distribute the incoming requests via an Application Load Balancer.
- Statelessness: A design where no client data is stored on the local server, facilitating horizontal scaling.
- Example: Storing user session data in Amazon ElastiCache (Redis) instead of on the web server's local hard drive so any server in the fleet can handle any request.
Worked Examples
Example 1: Resolving a Memory Leak in a Legacy App
Problem: You have a legacy monolithic application that is running out of RAM every 4 hours. The code is stateful and cannot be easily refactored. Solution:
- Identify: Since it is stateful and legacy, horizontal scaling is difficult.
- Action: Stop the instance and upgrade the instance class from a
t3.large(8GB RAM) to anr5.large(16GB RAM). - Result: The application has more "breathing room," but the underlying issue remains, and there is still no redundancy.
Example 2: Scaling a Global News Website
Problem: A news site expects a massive traffic surge during an election night. The site consists of static content and a read-heavy database. Solution:
- Web Tier: Set up an Auto Scaling Group with a policy to add instances when CPU exceeds 60%.
- Database Tier: Create five RDS Read Replicas and point the application's read-only queries to the Read-Only Endpoint.
- Result: The system scales horizontally to meet the millions of viewers without downtime.
Checkpoint Questions
- Which scaling strategy is most closely associated with the concept of "High Availability"? Why?
- True or False: Vertical scaling usually requires a period of downtime for the resource.
- In Amazon RDS, what is the primary method for scaling horizontally to support more read traffic?
- What AWS tool is used to automate the horizontal scaling of EC2 instances based on CloudWatch metrics?
[!TIP] Always check if your application is stateless before choosing horizontal scaling. If the app stores sessions locally, users will be logged out every time the Load Balancer sends them to a different instance! Use ElastiCache or DynamoDB to store state externally.