AWS Backup Practices & Methods: Comprehensive Study Guide
Backup practices and methods
AWS Backup Practices & Methods: Comprehensive Study Guide
This guide covers the essential knowledge required for the AWS Certified Solutions Architect - Professional (SAP-C02) exam regarding operational excellence, business continuity, and data protection.
Learning Objectives
- Define Recovery Objectives: Differentiate between RTO and RPO and how they drive backup frequency and method.
- Analyze DR Strategies: Understand where "Backup & Restore" fits in the hierarchy of disaster recovery (DR) strategies.
- Implement Centralized Backups: Utilize AWS Backup for policy-based, automated data protection across an AWS Organization.
- Enhance Security: Implement cross-account and cross-region backups to protect against ransomware and regional failures.
- Verify Integrity: Establish a process for periodic recovery testing to ensure backups are functional.
Key Terms & Glossary
- RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time (e.g., "We can lose up to 15 minutes of data").
- RTO (Recovery Time Objective): The maximum acceptable time to restore the workload after a failure (e.g., "The system must be back up in 4 hours").
- AWS Backup: A fully managed, policy-based service that centralizes and automates the backup of data across AWS services.
- Immutable Infrastructure: An operational approach where servers are never patched in place; instead, they are replaced by new versions from a fresh image (AMI).
- Cross-Region Copy: The practice of storing backup data in a different geographic region than the source data to protect against regional disasters.
The "Big Idea"
[!IMPORTANT] "Everything fails all the time." — Werner Vogels Reliability in the cloud isn't about preventing failure; it is about designing systems that can recover gracefully. Backups are the ultimate safety net. However, a backup that hasn't been tested for recovery is merely "hope," not a strategy. True operational excellence requires moving from manual, ad-hoc backups to automated, policy-driven, and regularly tested recovery workflows.
Formula / Concept Box
| Metric/Concept | Definition | Impact on Cost |
|---|---|---|
| RPO | Time between last backup and disaster. | Lower RPO = More frequent backups = Higher Cost. |
| RTO | Time between disaster and restoration. | Lower RTO = More automation/warm resources = Higher Cost. |
| Backup & Restore | Recovery from off-site storage. | Lowest cost, highest RTO/RPO. |
Hierarchical Outline
- I. Recovery Objectives
- RPO: Focuses on data loss (The "Back in Time" window).
- RTO: Focuses on downtime (The "Waiting for Service" window).
- Resilience Hub: AWS service used to evaluate if infrastructure meets these targets.
- II. Backup Strategies
- Mutable Environments: Require backing up data and patching existing OS (SSM Patch Manager).
- Immutable Environments: Focus on backing up data and re-deploying infrastructure via IaC (Infrastructure as Code).
- III. AWS Backup Service Features
- Centralized Management: Manage backups across multiple AWS accounts via AWS Organizations.
- Policy-based: Use "Backup Plans" to define frequency and retention.
- Security: Supports encryption and cross-account/region copying.
- IV. Disaster Recovery (DR) Tiers
- Backup & Restore: Simple, cheap, high RTO/RPO.
- Pilot Light: Core data is live; other resources are "off" until needed.
- Warm Standby: Scaled-down version of environment is always running.
- Multi-site Active/Active: Zero RTO/RPO, most expensive.
Visual Anchors
The Relationship Between RPO and RTO
\begin{tikzpicture}[>=latex, font=\small] \draw[->, thick] (0,0) -- (10,0) node[right] {Time}; \draw[fill=red!20] (5,0) circle (0.1) node[above=10pt, red] {Disaster Event}; \draw[thick, red] (5,-0.5) -- (5,1);
% RPO
\draw[<->, blue, thick] (2, -0.8) -- (5, -0.8);
\node[blue] at (3.5, -1.2) {RPO (Data Loss Window)};
\draw[dashed] (2,0) -- (2,-1.5);
\node at (2, 0.3) {Last Backup};
% RTO
\draw[<->, green!60!black, thick] (5, -0.8) -- (8, -0.8);
\node[green!60!black] at (6.5, -1.2) {RTO (Recovery Time Window)};
\draw[dashed] (8,0) -- (8,-1.5);
\node at (8, 0.3) {Service Restored};\end{tikzpicture}
Regional Backup & Restore Flow
Definition-Example Pairs
- Cross-Account Backup: The process of copying a backup to a completely different AWS Account ID.
- Example: A financial firm copies all RDS snapshots to a restricted "Vault" account. If an admin account in Production is compromised by ransomware, the attacker cannot delete the backups in the Vault account.
- Infrastructure as Code (IaC): Using code (CloudFormation/Terraform) to define and deploy resources.
- Example: Instead of backing up an entire EC2 instance, you store the configuration in a Git repository. During a disaster, you run a script to recreate the fleet in a new region instantly.
Worked Examples
Example 1: Calculating RPO Requirements
Scenario: A business updates its inventory database every hour. In the event of a failure, they can manually reconstruct 2 hours of data using paper receipts, but no more.
- Required RPO: 2 hours.
- Solution: Schedule an automated backup (using AWS Backup) every 2 hours at minimum. For safety, an hourly backup is recommended.
Example 2: Setting up a Cross-Region Backup
Task: Ensure an Amazon Aurora database in us-east-1 can survive a total regional outage.
- Create Backup Vault: Create a vault in
us-west-2(the target DR region). - Backup Plan: In
us-east-1, create an AWS Backup plan. - Copy Action: Within the plan rule, add a "Copy to destination" pointing to the
us-west-2vault. - Result: Snapshots are automatically generated and replicated, satisfying regional isolation.
Checkpoint Questions
- What is the main difference between RPO and RTO?
- Why is "Backup & Restore" considered the least expensive DR strategy?
- What tool can be used to verify if your current AWS architecture meets its RTO/RPO goals?
- In an immutable infrastructure model, do you need to back up the OS/Application files of an EC2 instance? Why or why not?
Muddy Points & Cross-Refs
- Backing Up Everything vs. Rebuilding: Students often get confused about whether to back up an EC2 image (AMI). If you use IaC, you usually only back up the data (EBS/RDS) and use the code to rebuild the "compute" layer.
- Backup vs. Pilot Light: In "Backup & Restore," nothing is running in the DR region until the disaster happens. In "Pilot Light," a live database is usually already running (the "pilot flame"), making recovery faster.
- Cross-Reference: See AWS Organizations for managing cross-account backup permissions and AWS KMS for encrypting backups during the copy process.
Comparison Tables
Disaster Recovery Strategy Comparison
| Strategy | RTO / RPO | Cost | Complexity |
|---|---|---|---|
| Backup & Restore | Hours / Days | $ (Lowest) | Simple |
| Pilot Light | Minutes / Hours | $$ | Moderate |
| Warm Standby | Minutes / Minutes | $$$ | High |
| Multi-Site | Near Zero | $$$$ (Highest) | Very High |