Study Guide920 words

AWS SAP-C02: Designing for Business Continuity

Design a solution to ensure business continuity

Designing a Solution for Business Continuity

This guide focuses on the strategies and architectural patterns required to ensure business continuity on AWS, specifically for the SAP-C02 (Solutions Architect Professional) exam. It explores the transition from local high availability to geographic disaster recovery.

Learning Objectives

After studying this guide, you will be able to:

  • Differentiate between High Availability (HA) and Disaster Recovery (DR).
  • Define and calculate Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
  • Evaluate and select among the four primary AWS DR strategies based on business requirements.
  • Design a business continuity plan that aligns technical solutions with organizational risk assessments.

Key Terms & Glossary

  • Business Continuity Plan (BCP): A comprehensive document outlining how a business will continue to operate during an unplanned disruption.
  • RTO (Recovery Time Objective): The maximum acceptable delay between the interruption of service and restoration of service.
  • RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time (e.g., losing 15 minutes of transactions).
  • Failover: The process of switching to a redundant or standby computer server, system, or network upon the failure of the previously active application.
  • Pilot Light: A DR strategy where a minimal version of the environment is always running in another region, primarily the data and core configuration.

The "Big Idea"

[!IMPORTANT] Business Continuity is the art of geographic decoupling. While High Availability (HA) protects you against a failing server or a single data center (AZ), Disaster Recovery (DR) protects you against a regional catastrophe. The "Big Idea" is that resilience is a spectrum: as you move toward zero data loss and zero downtime, the cost and complexity of your architecture increase exponentially.

Formula / Concept Box

MetricDefinitionUser Perspective
RTOTime to restore service"How long until I can log back in?"
RPOMax data loss (time)"How much of my work was lost since the last save?"

Hierarchical Outline

  1. HA vs. DR Fundamentals
    • High Availability (HA): Focuses on component-level redundancy within a region (Multi-AZ).
    • Disaster Recovery (DR): Focuses on site-level redundancy across regions (Cross-Region).
  2. The Planning Process
    • Risk Assessment: Evaluating the impact of AZ vs. Regional failures.
    • Business Impact Analysis (BIA): Determining the financial cost of downtime to set RTO/RPO.
  3. AWS Disaster Recovery Strategies
    • Backup and Restore: Low cost, high RTO/RPO (Hours).
    • Pilot Light: Core data live; application tier "dark" (minutes/hours).
    • Warm Standby: Scaled-down version of full environment always running (minutes).
    • Multi-Site Active-Active: Zero downtime; traffic split across regions (seconds/real-time).

Visual Anchors

DR Strategy Spectrum

Loading Diagram...

Regional Failover Architecture

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Warm Standby: A DR strategy where a "scaled down" but fully functional version of the environment is always running in the DR region.
    • Example: An e-commerce site running on 2 small EC2 instances in the DR region, while the primary region runs on 20 large instances. If primary fails, the DR instances scale up automatically.
  • Pilot Light: A strategy where only the most critical data is replicated (like a database), while application servers are stopped or only exist as AMIs.
    • Example: Keeping an RDS Read Replica in a second region. The web server layer is deployed via CloudFormation only after a disaster is declared.

Worked Examples

Scenario: Choosing a DR Strategy

Company X has a mission-critical banking application. Their Business Impact Analysis shows that 1 hour of downtime costs $1,000,000, and they cannot lose more than 5 minutes of transaction data.

  • Requirement: RTO < 1 hour; RPO < 5 minutes.
  • Elimination:
    • Backup & Restore is out (RTO is usually hours/days).
    • Pilot Light is risky (provisioning app servers might take > 1 hour depending on complexity).
  • Solution: Warm Standby or Multi-Site.
  • Selection: Given the $1M/hr cost, Warm Standby is the most cost-effective choice that guarantees meeting the 1-hour RTO, as the environment is already "warm" and just needs to scale.

Checkpoint Questions

  1. What is the main difference between HA and DR in an AWS context?
  2. If an organization uses Snapshot Replication every 12 hours, what is their RPO?
  3. Which DR strategy involves having a scaled-down version of the full environment always running?
  4. How does Route 53 support business continuity?
Click for Answers
  1. HA handles local failures (AZ); DR handles large-scale/regional failures.
  2. 12 hours.
  3. Warm Standby.
  4. Through health checks and DNS failover routing policies.

Muddy Points & Cross-Refs

  • HA vs DR Confusion: Students often think Multi-AZ is DR. Correction: Multi-AZ is HA. DR must involve separate geographic regions to protect against a regional event.
  • RPO vs RTO: Remember P in RPO stands for Past (how far back do we go in the data?). T in RTO stands for Time (how long does it take to get back up?).
  • Cross-Ref: See Chapter 6: Meeting Reliability Requirements for details on Auto Scaling and Self-Healing systems which form the foundation of HA.

Comparison Tables

StrategyRTO / RPOCostComplexityStrategy Description
Backup & RestoreHours/Days$LowRestore snapshots after a disaster.
Pilot LightMinutes/Hours$$MediumLive data, idle/stopped app servers.
Warm StandbyMinutes$$$HighScaled-down but active environment.
Multi-SiteSeconds$$$$Very HighFull active-active in two regions.

Ready to study AWS Certified Solutions Architect - Professional (SAP-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free