Study Guide1,184 words

AWS Certified Data Engineer: Protecting Data with Resiliency and Availability

Protect data with appropriate resiliency and availability

AWS Certified Data Engineer: Protecting Data with Resiliency and Availability

This guide focuses on the critical task of ensuring data durability, availability, and cost-efficient protection within the AWS ecosystem, covering both high availability (HA) and disaster recovery (DR) strategies.

Learning Objectives

By the end of this study guide, you will be able to:

  • Define and calculate RPO (Recovery Point Objective) and RTO (Recovery Time Objective).
  • Differentiate between High Availability and Disaster Recovery architectures.
  • Implement S3 Lifecycle policies for automatic tiering and data deletion.
  • Configure Cross-Region Replication (CRR) and Multi-AZ deployments for resiliency.
  • Utilize Amazon Macie, AWS Config, and S3 Versioning for data governance.

Key Terms & Glossary

  • RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time (e.g., losing 1 hour of data).
  • RTO (Recovery Time Objective): The target time to restore a service after a disruption (e.g., service back online in 4 hours).
  • Resiliency: The ability of a workload to respond to and recover from failures.
  • Availability: The percentage of time that a workload is operational (e.g., "four nines" or 99.99%).
  • CRR (Cross-Region Replication): Automatically copying S3 objects across different AWS Regions for geographic redundancy.
  • WORM (Write Once, Read Many): A data storage model ensured by S3 Object Lock to prevent modification or deletion.
  • TTL (Time to Live): A mechanism in DynamoDB to automatically expire items from a table to reduce storage costs.

The "Big Idea"

Data protection is not a "one size fits all" task. It is a strategic balance between Business Continuity and Cost Optimization. As a Data Engineer, you must classify data into tiers (Hot vs. Cold) and apply the appropriate level of resiliency (Multi-AZ vs. Multi-Region) based on the business's tolerance for downtime (RTO) and data loss (RPO). Priority Zero is Security, but Priority One is ensuring the data exists and is accessible when the business needs it.

Formula / Concept Box

Disaster Recovery Metrics

MetricDefinitionFocus
RPOTimeFailTimeLast BackupTime_{Fail} - Time_{Last\ Backup}Data Integrity / Loss Prevention
RTOTimeRecoveryTimeFailTime_{Recovery} - Time_{Fail}Service Uptime / Availability

[!TIP] Pro Tip: Smaller RPO/RTO values lead to higher availability but significantly higher infrastructure costs.

Hierarchical Outline

  1. Foundational Resilience Concepts
    • Recovery Metrics: Establishing RPO and RTO with stakeholders.
    • High Availability (HA): Using Multi-AZ for RDS, EBS, and MSK.
    • Disaster Recovery (DR): Regional failure protection using CRR and snapshots.
  2. Storage Resiliency Strategies
    • Amazon S3: Versioning (accidental deletion), Object Lock (compliance), and Replication.
    • Amazon EBS/RDS: Multi-AZ deployments for automated failover.
    • Amazon Redshift: Cross-Region snapshots for regional recovery.
  3. Data Lifecycle Management (DLM)
    • Automation: Using S3 Lifecycle Policies to transition data (Standard -> IA -> Glacier).
    • Archiving: Long-term storage in S3 Glacier for legal/compliance needs.
    • Cleanup: DynamoDB TTL and automated S3 expiration to minimize costs.
  4. Security & Governance
    • Discovery: Amazon Macie for identifying PII in S3.
    • Enforcement: AWS Config rules to ensure deletion and encryption policies are met.
    • Protection: AWS Shield for DDoS and AWS Backup for centralized management.

Visual Anchors

S3 Lifecycle Flow

Loading Diagram...

Resiliency Architecture Patterns

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • S3 Versioning: Storing multiple iterations of the same object.
    • Example: If a script accidentally deletes sales_data.csv, versioning allows you to restore the previous version instantly.
  • S3 Object Lock: Prevents an object from being deleted or overwritten for a fixed amount of time.
    • Example: A financial firm must keep records for 7 years per SEC rules; Object Lock ensures no admin can delete them prematurely.
  • DynamoDB TTL: Automatically deleting items based on a timestamp attribute.
    • Example: Deleting temporary session data for a web app after 24 hours of inactivity to keep the table size (and cost) lean.

Worked Examples

Scenario 1: Determining the DR Strategy

Problem: A healthcare company requires an architecture where data is replicated in real-time to a second region. They can tolerate almost zero data loss (RPO < 1 min) and need to be back online within 10 minutes of a regional failure. Solution:

  1. Architecture: Active-Active or Warm Standby (Active-Passive).
  2. S3: Enable Cross-Region Replication (CRR).
  3. RDS: Use Cross-Region Read Replicas or Aurora Global Database.
  4. Result: Low RPO via continuous replication; Low RTO via pre-provisioned resources in the second region.

Scenario 2: S3 Lifecycle Policy Configuration

Problem: You have logs that are accessed daily for 30 days, then rarely accessed for the next year, and must be kept for 5 years for compliance. Solution:

  • Transition 1: After 30 days, move from S3 Standard to S3 Standard-IA.
  • Transition 2: After 365 days, move to S3 Glacier Deep Archive.
  • Expiration: After 1,825 days (5 years), delete the object.

Checkpoint Questions

  1. What is the main difference between S3 Cross-Region Replication and S3 Versioning regarding data protection?
  2. Which service would you use to discover sensitive PII data before moving it to an archive?
  3. True or False: Serverless analytics solutions like AWS Glue have built-in high availability.
  4. If a company can tolerate losing 24 hours of data, what is their RPO?

Comparison Tables

Disaster Recovery Architecture Comparison

StrategyCostRTO/RPOComplexity
Backup & RestoreLowHours/DaysSimple
Pilot LightMediumMinutes/HoursModerate
Warm StandbyHighMinutesHigh
Active-ActiveVery HighNear ZeroVery High

Hot vs. Cold Storage Services

RequirementService (Hot)Service (Cold)
LatencyMilliseconds (DynamoDB/ElastiCache)Minutes/Hours (Glacier)
Access FrequencyHigh (S3 Standard)Low (S3 Glacier / IA)
Cost per GBHighVery Low

Muddy Points & Cross-Refs

  • Availability vs. Resiliency: Many students confuse these. Think of availability as the result (Is it up?) and resiliency as the method (How do we keep it up when things break?).
  • S3 Intelligent-Tiering: Unlike manual Lifecycle policies, this automatically moves data based on observed access patterns. Use this if your access patterns are unknown or unpredictable.
  • Multi-AZ vs. Read Replicas: Multi-AZ is for HA (failover); Read Replicas are primarily for scaling performance, though Cross-Region Read Replicas can be part of a DR strategy.

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free