Lab: Implementing a Pilot Light Disaster Recovery Strategy on AWS
Design a solution to ensure business continuity
Lab: Implementing a Pilot Light Disaster Recovery Strategy on AWS
In this lab, you will design and implement a business continuity solution using the Pilot Light strategy. You will configure cross-region data replication and setup automated failover using Amazon Route 53 to ensure that your application can recover from a regional disaster with minimal downtime.
[!WARNING] Remember to run the teardown commands at the end of the lab to avoid ongoing charges for multi-region resources.
Prerequisites
- An active AWS Account with Administrator access.
- AWS CLI installed and configured with your credentials.
- Basic knowledge of S3 and Route 53.
- Two AWS Regions: We will use
us-east-1(Primary) andus-west-2(Secondary).
Learning Objectives
- Configure S3 Cross-Region Replication (CRR) for data durability.
- Create and manage IAM Roles for service-to-service replication.
- Implement Route 53 Health Checks to monitor service availability.
- Design a Failover Routing Policy to redirect traffic during a regional outage.
Architecture Overview
Step-by-Step Instructions
Step 1: Create the Primary and Secondary S3 Buckets
First, we need to create buckets in two different regions. Versioning must be enabled for Cross-Region Replication.
CLI Command:
# Create Primary Bucket
aws s3api create-bucket --bucket brainybee-dr-primary-<YOUR_ID> --region us-east-1
# Create Secondary Bucket
aws s3api create-bucket --bucket brainybee-dr-secondary-<YOUR_ID> --region us-west-2 --create-bucket-configuration LocationConstraint=us-west-2
# Enable Versioning on both
aws s3api put-bucket-versioning --bucket brainybee-dr-primary-<YOUR_ID> --versioning-configuration Status=Enabled
aws s3api put-bucket-versioning --bucket brainybee-dr-secondary-<YOUR_ID> --versioning-configuration Status=Enabled▶Console alternative
- Navigate to S3 in the AWS Console.
- Click Create bucket.
- Name:
brainybee-dr-primary-<ID>, Region:us-east-1. - Ensure Bucket Versioning is set to Enable.
- Repeat for the secondary bucket in
us-west-2.
Step 2: Configure Cross-Region Replication (CRR)
To ensure data exists in our "Pilot Light" region, we must automate replication.
CLI Command:
- Create an IAM Trust Policy file named
trust-policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "s3.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}- Create the role and apply the replication configuration (this requires a replication JSON; for brevity, use the Console for this complex step).
▶Console alternative
- Go to the Primary Bucket > Management tab.
- Scroll to Replication rules and click Create replication rule.
- Rule name:
PilotLightReplication. - Destination: Choose the Secondary Bucket.
- IAM Role: Select Create new role.
- Click Save.
Step 3: Set Up Route 53 Health Checks
Route 53 needs a way to know if the primary region is down.
CLI Command:
aws route53 create-health-check --caller-reference $(date +%s) --health-check-config "IPAddress=1.1.1.1,Port=80,Type=HTTP,ResourcePath=/,RequestInterval=30,FailureThreshold=3"(Note: In a real scenario, you would point this to your Application Load Balancer or S3 Endpoint).
Checkpoints
| Verification Task | Expected Result |
|---|---|
| S3 Replication | Upload a file to the Primary bucket. Wait 1 minute. The file should appear in the Secondary bucket. |
| Versioning | Check the 'Versions' toggle in S3; you should see unique version IDs for replicated objects. |
| Health Check | Navigate to Route 53 > Health Checks. The status should eventually show 'Healthy' (or 'Unhealthy' if using the dummy IP above). |
Teardown / Clean-Up
To prevent unwanted costs, delete all resources in this order:
- Empty and Delete S3 Buckets:
bash
# You must remove all versions first aws s3 rb s3://brainybee-dr-primary-<YOUR_ID> --force aws s3 rb s3://brainybee-dr-secondary-<YOUR_ID> --force - Delete Health Checks:
bash
# Get your ID from: aws route53 list-health-checks aws route53 delete-health-check --health-check-id <YOUR_HEALTH_CHECK_ID> - Delete IAM Role: Remove the
s3-replication-rolecreated by the console.
Troubleshooting
| Error | Possible Cause | Fix |
|---|---|---|
AccessDenied on Replication | IAM Role lacks permissions. | Ensure the IAM Role has s3:GetReplicationConfiguration and s3:GetObjectVersion on the source. |
| Objects not replicating | Versioning is disabled. | CRR requires versioning to be enabled on BOTH source and destination buckets. |
| Health check stays 'Unhealthy' | Firewall/Security Group. | Ensure your endpoint allows traffic from AWS Route 53 IP ranges. |
Stretch Challenge
Automated Database Recovery: Use Amazon RDS Global Database. Set up a primary cluster in us-east-1 and a read replica in us-west-2. Practice the "Promote" button in the RDS console to simulate a manual failover of the database layer in a Pilot Light scenario.
Cost Estimate
- S3 Storage: $0.023 per GB (Primary) + $0.023 per GB (Secondary).
- S3 Replication Data Transfer: $0.02 per GB transferred between regions.
- Route 53 Health Check: ~$0.50 per month per health check.
- Total for Lab: Generally <$1.00 if deleted within 1 hour.
Concept Review
In this lab, we focused on the Pilot Light strategy. Compare it to other DR strategies below:
| Strategy | RTO (Time) | RPO (Data Loss) | Cost |
|---|---|---|---|
| Backup & Restore | Hours | 24 Hours | $ |
| Pilot Light | Minutes/Hours | Low (Minutes) | $$ |
| Warm Standby | Minutes | Near Zero | $$$ |
| Multi-Site (Active-Active) | Real-time | Zero | $$$$ |