AWS Backup Practices & Methods: Comprehensive Study Guide

This guide covers the essential knowledge required for the AWS Certified Solutions Architect - Professional (SAP-C02) exam regarding operational excellence, business continuity, and data protection.

Learning Objectives

Define Recovery Objectives: Differentiate between RTO and RPO and how they drive backup frequency and method.
Analyze DR Strategies: Understand where "Backup & Restore" fits in the hierarchy of disaster recovery (DR) strategies.
Implement Centralized Backups: Utilize AWS Backup for policy-based, automated data protection across an AWS Organization.
Enhance Security: Implement cross-account and cross-region backups to protect against ransomware and regional failures.
Verify Integrity: Establish a process for periodic recovery testing to ensure backups are functional.

Key Terms & Glossary

RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time (e.g., "We can lose up to 15 minutes of data").
RTO (Recovery Time Objective): The maximum acceptable time to restore the workload after a failure (e.g., "The system must be back up in 4 hours").
AWS Backup: A fully managed, policy-based service that centralizes and automates the backup of data across AWS services.
Immutable Infrastructure: An operational approach where servers are never patched in place; instead, they are replaced by new versions from a fresh image (AMI).
Cross-Region Copy: The practice of storing backup data in a different geographic region than the source data to protect against regional disasters.

The "Big Idea"

[!IMPORTANT] "Everything fails all the time." — Werner Vogels Reliability in the cloud isn't about preventing failure; it is about designing systems that can recover gracefully. Backups are the ultimate safety net. However, a backup that hasn't been tested for recovery is merely "hope," not a strategy. True operational excellence requires moving from manual, ad-hoc backups to automated, policy-driven, and regularly tested recovery workflows.

Formula / Concept Box

Metric/Concept	Definition	Impact on Cost
RPO	Time between last backup and disaster.	Lower RPO = More frequent backups = Higher Cost.
RTO	Time between disaster and restoration.	Lower RTO = More automation/warm resources = Higher Cost.
Backup & Restore	Recovery from off-site storage.	Lowest cost, highest RTO/RPO.

Hierarchical Outline

I. Recovery Objectives
- RPO: Focuses on data loss (The "Back in Time" window).
- RTO: Focuses on downtime (The "Waiting for Service" window).
- Resilience Hub: AWS service used to evaluate if infrastructure meets these targets.
II. Backup Strategies
- Mutable Environments: Require backing up data and patching existing OS (SSM Patch Manager).
- Immutable Environments: Focus on backing up data and re-deploying infrastructure via IaC (Infrastructure as Code).
III. AWS Backup Service Features
- Centralized Management: Manage backups across multiple AWS accounts via AWS Organizations.
- Policy-based: Use "Backup Plans" to define frequency and retention.
- Security: Supports encryption and cross-account/region copying.
IV. Disaster Recovery (DR) Tiers
- Backup & Restore: Simple, cheap, high RTO/RPO.
- Pilot Light: Core data is live; other resources are "off" until needed.
- Warm Standby: Scaled-down version of environment is always running.
- Multi-site Active/Active: Zero RTO/RPO, most expensive.

Visual Anchors

The Relationship Between RPO and RTO

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Regional Backup & Restore Flow

Loading Diagram...

Definition-Example Pairs

Cross-Account Backup: The process of copying a backup to a completely different AWS Account ID.
- Example: A financial firm copies all RDS snapshots to a restricted "Vault" account. If an admin account in Production is compromised by ransomware, the attacker cannot delete the backups in the Vault account.
Infrastructure as Code (IaC): Using code (CloudFormation/Terraform) to define and deploy resources.
- Example: Instead of backing up an entire EC2 instance, you store the configuration in a Git repository. During a disaster, you run a script to recreate the fleet in a new region instantly.

Worked Examples

Example 1: Calculating RPO Requirements

Scenario: A business updates its inventory database every hour. In the event of a failure, they can manually reconstruct 2 hours of data using paper receipts, but no more.

Required RPO: 2 hours.
Solution: Schedule an automated backup (using AWS Backup) every 2 hours at minimum. For safety, an hourly backup is recommended.

Example 2: Setting up a Cross-Region Backup

Task: Ensure an Amazon Aurora database in us-east-1 can survive a total regional outage.

Create Backup Vault: Create a vault in us-west-2 (the target DR region).
Backup Plan: In us-east-1, create an AWS Backup plan.
Copy Action: Within the plan rule, add a "Copy to destination" pointing to the us-west-2 vault.
Result: Snapshots are automatically generated and replicated, satisfying regional isolation.

Checkpoint Questions

What is the main difference between RPO and RTO?
Why is "Backup & Restore" considered the least expensive DR strategy?
What tool can be used to verify if your current AWS architecture meets its RTO/RPO goals?
In an immutable infrastructure model, do you need to back up the OS/Application files of an EC2 instance? Why or why not?

Muddy Points & Cross-Refs

Backing Up Everything vs. Rebuilding: Students often get confused about whether to back up an EC2 image (AMI). If you use IaC, you usually only back up the data (EBS/RDS) and use the code to rebuild the "compute" layer.
Backup vs. Pilot Light: In "Backup & Restore," nothing is running in the DR region until the disaster happens. In "Pilot Light," a live database is usually already running (the "pilot flame"), making recovery faster.
Cross-Reference: See AWS Organizations for managing cross-account backup permissions and AWS KMS for encrypting backups during the copy process.

Comparison Tables

Disaster Recovery Strategy Comparison

Strategy	RTO / RPO	Cost	Complexity
Backup & Restore	Hours / Days	$ (Lowest)	Simple
Pilot Light	Minutes / Hours	$$	Moderate
Warm Standby	Minutes / Minutes	$$$	High
Multi-Site	Near Zero	$$$$ (Highest)	Very High