AWS Data Recovery and Backup Strategies: SAA-C03 Study Guide
Data recovery
AWS Data Recovery and Backup Strategies
This guide covers the essential mechanisms for data protection and disaster recovery across primary AWS storage and database services, focusing on the metrics and tools required for the SAA-C03 exam.
Learning Objectives
After studying this guide, you should be able to:
- Define and differentiate between Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
- Explain how S3 Versioning and Cross-Region Replication (CRR) protect against data loss.
- Describe backup and snapshot mechanisms for EBS, EFS, and RDS.
- Compare automated vs. manual recovery processes for Availability Zone and Regional failures.
Key Terms & Glossary
- RTO (Recovery Time Objective): The maximum acceptable duration of time to restore a system after a failure.
- RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time (e.g., losing 5 minutes of data).
- Snapshot: A point-in-time, incremental backup of a storage volume (EBS, RDS) stored in Amazon S3.
- Delete Marker: An object version in S3 that acts as a placeholder for a deleted object, allowing for easy restoration.
- Point-in-Time Recovery (PITR): A feature (primarily in RDS) that allows restoring a database to any specific second within a retention period.
The "Big Idea"
In the AWS Shared Responsibility Model, AWS ensures the infrastructure is resilient (Multi-AZ), but you are responsible for data recovery. Resilience is not recovery; while Multi-AZ protects against hardware failure, it does not protect against accidental deletion or data corruption. Recovery strategies must be designed to meet specific business needs defined by RTO and RPO.
Formula / Concept Box
| Metric | Focus | Question to Ask |
|---|---|---|
| RTO | Time/Downtime | "How quickly do we need to be back online?" |
| RPO | Data/Loss | "How much data can we afford to lose?" |
[!IMPORTANT] RDS PITR RPO: Enabling automatic backups for RDS provides a 5-minute RPO because transaction logs are uploaded to S3 every 5 minutes.
Hierarchical Outline
- Object Storage (Amazon S3)
- Versioning: Protects against accidental overwrites/deletes. Objects are never truly deleted; only a Delete Marker is added.
- Cross-Region Replication (CRR): Synchronous replication for regional disaster recovery. Requires versioning on both buckets.
- Block and File Storage
- EBS (Elastic Block Store): Replicated within an AZ. Use Snapshots for inter-AZ resilience. Use Data Lifecycle Manager (DLM) for automation.
- EFS (Elastic File System): Regional by default (Multi-AZ). Back up to S3 or another EFS using AWS Backup.
- Relational Databases (Amazon RDS)
- Automated Backups: Daily snapshots + transaction logs. Retention: 1 to 35 days (Default 7).
- Manual Snapshots: User-initiated; persist even after the RDS instance is deleted.
- Restoration: Always creates a new database instance with a new endpoint.
Visual Anchors
S3 Versioning Logic
RTO and RPO Timeline
\begin{tikzpicture}[node distance=2cm, every node/.style={font=\small}] \draw [thick, ->] (0,0) -- (10,0) node[right] {Time}; \draw [red, ultra thick] (6,-0.5) -- (6,0.5) node[above] {FAILURE EVENT};
\draw [blue, <->] (2,-1) -- (6,-1);
\node at (4,-1.3) {RPO (Data Loss Window)};
\draw [green!60!black, <->] (6,-1) -- (9,-1);
\node at (7.5,-1.3) {RTO (Downtime)};
\filldraw [black] (2,0) circle (2pt) node[above] {Last Backup};
\filldraw [black] (9,0) circle (2pt) node[above] {Service Restored};\end{tikzpicture}
Definition-Example Pairs
- Delete Marker
- Definition: A marker placed on an object version that hides it from standard list requests without removing the data.
- Example: A developer accidentally deletes
config.json. Instead of losing the file, S3 adds a marker. The developer simply deletes the marker to "undelete" the file.
- Amazon Data Lifecycle Manager (DLM)
- Definition: An automated service to manage the creation, retention, and deletion of EBS snapshots.
- Example: A policy is set to take a snapshot of all "Production" labeled volumes every 12 hours and keep the last 7, ensuring a rolling 3.5-day recovery window.
Worked Examples
Example 1: Calculating RDS Recovery
Scenario: A company has a 30-minute backup window at 02:00. At 14:00, a database corruption occurs.
- Question: What is the maximum data loss if PITR is enabled?
- Solution: With PITR, logs are sent to S3 every 5 minutes. The RPO is 5 minutes. The company can restore to 13:55.
- Note: Restoring the actual instance may take hours (RTO), depending on data volume and provisioned IOPS.
Example 2: S3 Cross-Region Replication (CRR)
Scenario: You need to replicate data from US-East-1 to EU-West-1 for compliance.
- Requirement: Versioning must be enabled on both the source and destination buckets.
- Nuance: If you delete an object in the source bucket (creating a delete marker), that marker is not replicated. This prevents a "malicious delete" in one region from destroying data in the second region.
Checkpoint Questions
- How does RDS Multi-AZ affect snapshot performance?
- Answer: In Multi-AZ (except SQL Server), snapshots are taken from the standby instance, avoiding I/O suspension on the primary.
- What happens to automated RDS snapshots when the instance is deleted?
- Answer: They are deleted. Only manual snapshots are retained.
- If an EC2 instance with an EBS volume is terminated by Auto Scaling, what happens to the logs stored on that volume?
- Answer: By default, the EBS volume is deleted and logs are lost. Use CloudWatch Logs for real-time persistence.
- True/False: S3 Cross-Region Replication is asynchronous.
- Answer: False. S3 replicates objects synchronously once enabled (though there is a slight replication lag).
[!TIP] For the exam, remember: Snapshots = S3. Even though you use EBS or RDS, the backup data physically resides in S3 for high durability.