Mastering Database Replication: RDS Read Replicas & Scaling Strategies
Database replication (for example, read replicas)
Mastering Database Replication: RDS Read Replicas & Scaling Strategies
This guide covers the essentials of database replication within the AWS ecosystem, specifically focusing on Amazon RDS and Aurora. Understanding how to scale database performance and ensure high availability is a core competency for the AWS Solutions Architect Associate (SAA-C03) exam.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between Vertical Scaling (Scaling Up) and Horizontal Scaling (Scaling Out).
- Explain the architectural purpose and benefits of Read Replicas.
- Identify the replication limits for RDS and Aurora instances.
- Contrast Read Replicas with Multi-AZ Deployments for high availability.
- Understand the implications of Asynchronous Replication on data consistency and disaster recovery.
Key Terms & Glossary
- Read Replica: A read-only copy of a primary database instance used to offload query traffic.
- Asynchronous Replication: A data transfer method where the primary database does not wait for the replica to acknowledge receipt of data before completing a write transaction.
- Scaling Out: Adding more resource units (e.g., more instances) to a system to handle increased load.
- Scaling Up: Increasing the capacity of a single resource (e.g., adding more RAM or CPU to an instance).
- Promote: The process of converting a read-only replica into a standalone, primary read/write database instance.
- Replication Lag: The time delay between a write on the master and its appearance on the replica.
The "Big Idea"
In a standard database setup, the primary instance handles both Writes (INSERT, UPDATE, DELETE) and Reads (SELECT). As traffic grows, the primary instance often becomes a bottleneck. The "Big Idea" behind replication is to separate these concerns. By delegating read-heavy queries to one or more replicas, the primary instance is freed up to focus on data modifications, effectively increasing the overall throughput and responsiveness of the application.
Formula / Concept Box
| Concept | Metric / Rule |
|---|---|
| RDS Max Replicas | Up to 5 per primary instance |
| Aurora Max Replicas | Up to 15 per cluster |
| Replication Type | Asynchronous (typically) |
| Endpoint Type | Read-only DNS Endpoint |
| Connectivity | Replicas can be Cross-AZ or Cross-Region |
Hierarchical Outline
- Scaling Methodologies
- Vertical Scaling (Up): Increasing instance class (e.g.,
db.t3.mediumtodb.m5.large). Best for compute/memory bottlenecks. - Horizontal Scaling (Out): Adding Read Replicas. Best for read-heavy application bottlenecks.
- Vertical Scaling (Up): Increasing instance class (e.g.,
- Read Replica Architecture
- Function: Handles
SELECTqueries only. - Write Flow: App Master Async Copy $\rightarrow Replica.
- Read Flow: App \rightarrow\rightarrow$ Replica.
- Function: Handles
- Operational Mechanics
- Asynchronicity: Leads to potential "Eventual Consistency."
- Promotion: Replicas can be promoted to Master for manual failover or testing.
- Availability: Replicas can exist in different Availability Zones or even different Regions to reduce latency for global users.
- Comparison: Read Replicas vs. Multi-AZ
- Multi-AZ: Focuses on High Availability (Synchronous, automatic failover).
- Read Replicas: Focuses on Performance/Scalability (Asynchronous, manual promotion).
Visual Anchors
Database Traffic Distribution
Scaling: Up vs. Out
\begin{tikzpicture}[scale=0.8] % Scaling Up \draw[fill=blue!10] (0,0) rectangle (1.5,1.5); \node at (0.75, 0.75) {S}; \draw[->, thick] (2,0.75) -- (3,0.75); \draw[fill=blue!30] (3.5,-0.25) rectangle (5.5,1.75); \node at (4.5, 0.75) {XL}; \node[below] at (2.5,-0.5) {Scaling Up (Vertical)};
% Scaling Out
\begin{scope}[xshift=8cm]
\draw[fill=green!10] (0,0) rectangle (1,1);
\draw[->, thick] (1.5,0.5) -- (2.5,0.5);
\draw[fill=green!20] (3,1.2) rectangle (4,2.2);
\draw[fill=green!20] (3,0) rectangle (4,1);
\draw[fill=green!20] (3,-1.2) rectangle (4,-0.2);
\node[below] at (2,-1.5) {Scaling Out (Horizontal)};
\end{scope}\end{tikzpicture}
Definition-Example Pairs
- Read-Heavy Workload: An application that performs significantly more data retrievals than updates.
- Example: A news website where thousands of users read articles, but only a few editors post new content.
- Promotion: Changing the status of a database from a subordinate copy to a primary leader.
- Example: During a regional outage, an architect manually promotes a Cross-Region Read Replica in
us-west-2to become the new primary database to restore service.
- Example: During a regional outage, an architect manually promotes a Cross-Region Read Replica in
- Eventual Consistency: A consistency model where it is guaranteed that if no new updates are made to a data item, eventually all accesses will return the last updated value.
- Example: A user updates their profile picture; their friend might see the old picture for a few seconds because the friend's request was routed to a Read Replica that hadn't received the update yet.
Worked Examples
Case 1: The Reporting Bottleneck
Scenario: A company runs a production MySQL RDS database. Every Friday, the analytics team runs heavy SQL queries to generate weekly reports. This causes the production website to slow down significantly. Solution:
- Create an RDS Read Replica.
- Provide the analytics team with the Read-Only Endpoint.
- The reporting queries now run against the replica, leaving the master instance's resources available for the website's transactions.
Case 2: Promotion for Recovery
Scenario: The master database instance in us-east-1a fails. You have a Read Replica in us-east-1b.
Step-by-Step Breakdown:
- Verification: Confirm the master is unavailable via the RDS Console.
- Action: Select the Read Replica in the console.
- Command: Choose Actions > Promote.
- Result: The replica reboots and becomes a standalone DB instance with its own read/write capabilities. Note: Asynchronous lag might result in some data loss from the moment of failure.
Checkpoint Questions
- What is the maximum number of Read Replicas allowed for an Amazon Aurora cluster?
- True or False: Read Replicas use synchronous replication to ensure zero data loss.
- Which database engine is the only one that does NOT support standard RDS Read Replicas (unless using Enterprise Edition)?
- How does an application connect specifically to a Read Replica instead of the Master?
- What is the primary difference between a Multi-AZ standby and a Read Replica?
▶Click to see answers
- 15 (RDS allows 5, Aurora allows 15).
- False. They use asynchronous replication.
- Microsoft SQL Server (though Enterprise Edition is supported).
- Via the Read-Only Endpoint (DNS name) provided by RDS.
- Multi-AZ is for Disaster Recovery/High Availability (Synchronous, no manual intervention for failover); Read Replicas are for Scalability (Asynchronous, requires manual promotion to become a master).