Study Guide890 words

AWS Storage Characteristics: Durability, Availability, and Replication

Storage options and characteristics (for example, durability, replication)

AWS Storage Characteristics: Durability, Availability, and Replication

This guide covers the fundamental characteristics of AWS storage services, focusing on how data is preserved (durability), accessed (availability), and protected against failure (replication).

Learning Objectives

By the end of this study guide, you should be able to:

  • Distinguish between data durability and data availability.
  • Identify the replication defaults for core AWS storage services like S3 and EBS.
  • Select appropriate S3 storage classes based on durability and availability requirements.
  • Explain the concept of "eventual consistency" in distributed storage.
  • Evaluate RAID configurations on EBS for performance and reliability.

Key Terms & Glossary

  • Durability: The likelihood that data written to storage can be successfully retrieved in the future. Measured in "nines" (e.g., 99.999999999%).
  • Availability: The percentage of time a storage system is online and ready to serve requests during a year.
  • Replication: The process of creating redundant copies of data across different physical locations (Availability Zones or Regions).
  • Eventually Consistent: A consistency model where updates to a data object might not be visible to all readers immediately, but will be across all copies after a brief propagation delay.
  • Single Point of Failure (SPOF): A part of a system that, if it fails, will stop the entire system from working.

The "Big Idea"

[!IMPORTANT] Redundancy is the antidote to failure. In AWS, reliability is achieved by assuming that everything will eventually fail. By decoupling data from specific hardware and replicating it across independent failure domains (Availability Zones), AWS ensures that even if a data center is destroyed, your information remains intact and accessible.

Formula / Concept Box

ConceptMetricMeaning
Durability11 Nines (99.999999999%)Probability of data loss is nearly zero (e.g., losing 1 object out of 10,000,000 every 10,000 years).
Availability4 Nines (99.99%)Max downtime of ~52 minutes per year.
Availability3 Nines (99.9%)Max downtime of ~8.76 hours per year.
S3 Standard Replication3+ AZsData is automatically replicated across a minimum of three physically separate Availability Zones.

Hierarchical Outline

  1. Data Resilience Metrics
    • Durability: Focuses on Data Integrity. Even if you can't access data right now (low availability), high durability means the data is not lost.
    • Availability: Focuses on Uptime. High availability means the system responds to requests immediately.
  2. S3 Storage Classes
    • S3 Standard: High durability (11 nines) and high availability (99.99%). Replicated across 3+ AZs.
    • S3 Standard-IA: Lower availability (99.9%) but same durability; lower cost for infrequent access.
    • S3 One Zone-IA: Lower availability (99.5%) and lower resilience because data is in only one AZ.
    • S3 Intelligent-Tiering: Automatically moves data between tiers based on access patterns to optimize cost.
  3. Consistency Models
    • Eventual Consistency: S3 replicates data across locations. Updates/deletes may take ~2 seconds to propagate. High-speed writes might result in a "stale read" during this window.
  4. Block Storage (EBS) Performance & Reliability
    • RAID 0 (Striping): Increases performance but decreases reliability (no redundancy).
    • RAID 1 (Mirroring): Increases reliability but is not recommended on AWS due to network bandwidth overhead; use native EBS replication instead.

Visual Anchors

Data Replication Hierarchy

Loading Diagram...

Durability vs. Availability Visualization

\begin{tikzpicture} \draw[thick,->] (0,0) -- (6,0) node[anchor=north] {Availability (Uptime)}; \draw[thick,->] (0,0) -- (0,6) node[anchor=east] {Durability (Safety)};

code
% S3 Standard \filldraw[blue] (5,5) circle (4pt) node[anchor=south west] {S3 Standard}; % S3 Standard-IA \filldraw[red] (3,5) circle (4pt) node[anchor=south west] {S3 Standard-IA}; % S3 One Zone-IA \filldraw[orange] (1,3) circle (4pt) node[anchor=south west] {S3 One Zone-IA}; \node[draw, dashed] at (3,-1) {Standard-IA has high safety but lower immediate uptime};\end{tikzpicture}

Definition-Example Pairs

  • Durability: The structural integrity of stored bits.
    • Example: If a hard drive in an AWS data center catches fire, high durability ensures two other copies exist elsewhere so you don't lose your photos.
  • Availability: The "open for business" status of the service.
    • Example: If the S3 API is down for maintenance, your data has 0% availability, even though its durability remains 11 nines (the data is safe, just currently unreachable).
  • Eventual Consistency: The delay in synchronizing data copies.
    • Example: You update a profile picture on a website. Because of propagation delay, your friend in a different region sees the old picture for two seconds before it refreshes to the new one.

Worked Examples

Scenario 1: Cost-Optimized Backup

Requirement: A company needs to store 10TB of secondary backup data. The data is rarely accessed, but it is critical and cannot be recreated if lost.

  • Solution: S3 Glacier or S3 Standard-IA.
  • Reasoning: One Zone-IA is inappropriate because the data is "critical and cannot be recreated." Standard-IA provides the required 11 nines of durability across multiple AZs while reducing the cost compared to S3 Standard.

Scenario 2: High-Performance Database Volume

Requirement: A database needs 20,000 IOPS on a single EC2 instance. The data is already backed up to S3 daily.

  • Solution: RAID 0 with multiple EBS volumes.
  • Reasoning: RAID 0 provides the necessary performance boost through striping. Although it increases the risk of volume failure, the business requirement prioritizes performance, and the risk is mitigated by the existing S3 backup.

Checkpoint Questions

  1. What is the default number of Availability Zones that S3 Standard replicates data across?
  2. If a storage class has 99.9% availability, what is the maximum approximate downtime allowed per year?
  3. Why does AWS advise against using RAID 1 on EBS volumes?
  4. Which S3 storage class is the only one that does NOT replicate data across multiple Availability Zones?
Click to see answers
  1. Three (or more).
  2. Approximately 8.76 hours.
  3. It unnecessarily increases EC2-to-EBS bandwidth usage without providing significant benefits over native EBS replication.
  4. S3 One Zone-IA.

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free