Study Guide920 words

Comprehensive Study Guide: Storage Options on AWS

Storage options on AWS

Comprehensive Study Guide: Storage Options on AWS

Learning Objectives

After studying this guide, you should be able to:

  • Distinguish between block-level, file-level, and object-level storage architectures.
  • Select the appropriate Amazon EBS volume type based on IOPS and throughput requirements.
  • Evaluate Amazon S3 storage classes to optimize for cost, durability, and access frequency.
  • Design hybrid storage solutions using AWS Storage Gateway.
  • Implement data migration strategies using AWS DataSync.

Key Terms & Glossary

  • IOPS (Input/Output Operations Per Second): A measure of performance for block storage, representing how many read/write operations can occur in one second.
  • Throughput: The amount of data moved from one place to another in a given time period (e.g., MB/s).
  • Durability: The probability that a stored object will not be lost over a period of time (S3 offers 11 nines).
  • WORM (Write Once, Read Many): A data storage device in which information, once written, cannot be modified (e.g., S3 Object Lock).
  • iSCSI (Internet Small Computer Systems Interface): An IP-based storage networking standard for linking data storage facilities, used by Storage Gateway.

The "Big Idea"

Selecting storage on AWS is not a "one size fits all" decision. It is a multidimensional optimization problem where you must balance Access Patterns (random vs. sequential), Performance Requirements (latency, IOPS, throughput), and Cost Constraints. Modern cloud architectures often leverage a multi-tiered approach—using block storage for active databases, file storage for shared application data, and object storage for scalable, cost-effective long-term persistence.

Formula / Concept Box

Pricing FactorAmazon EBSAmazon S3
StoragePer GB-month provisionedPer GB-month consumed
RequestsIncluded (for most types)Charged per 1,000 requests (PUT, GET, etc.)
Data TransferOutbound/Cross-Region chargesOutbound/Cross-Region charges
PersistenceSnapshots stored in S3Native versioning and replication

Hierarchical Outline

  1. Block Storage (Amazon EBS)
    • SSD-Backed: Focused on low latency and high IOPS (gp3, io2).
    • HDD-Backed: Focused on large sequential throughput (st1, sc1).
    • Snapshots: Incremental backups stored in Amazon S3.
  2. Object Storage (Amazon S3)
    • Standard: Frequently accessed data.
    • Infrequent Access (IA): Lower cost for older but still online data.
    • Glacier: Archival storage with retrieval times from minutes to hours.
  3. File Storage (EFS & FSx)
    • EFS: Managed NFS for Linux-based workloads.
    • FSx: High-performance file systems for Windows, Lustre, NetApp ONTAP.
  4. Hybrid & Migration
    • Storage Gateway: Connects on-premises apps to cloud storage via iSCSI/NFS/SMB.
    • DataSync: High-speed online data transfer between storage systems.

Visual Anchors

Storage Selection Decision Tree

Loading Diagram...

EBS Performance Spectrum

\begin{tikzpicture} \draw[thick, ->] (0,0) -- (8,0) node[anchor=north] {\mbox{Cost (Lower \rightarrow Higher)}}; \draw[thick, ->] (0,0) -- (0,5) node[anchor=east] {Performance (IOPS/Latency)}; \filldraw[blue] (1,0.5) circle (2pt) node[anchor=south] {sc1 (Cold HDD)}; \filldraw[blue] (3,1.5) circle (2pt) node[anchor=south] {st1 (Throughput HDD)}; \filldraw[red] (5,3) circle (2pt) node[anchor=south] {gp3 (General Purpose SSD)}; \filldraw[red] (7,4.5) circle (2pt) node[anchor=south] {io2 (Provisioned IOPS)}; \draw[dashed] (0,2.2) -- (8,2.2) node[anchor=west] {SSD/HDD Divide}; \end{tikzpicture}

Definition-Example Pairs

  • Ephemeral Storage: Temporary storage that is deleted when the instance stops. Example: EC2 Instance Store used for temporary buffers and caches.
  • Cold Data: Data that is rarely accessed but must be retained. Example: Compliance logs from three years ago stored in S3 Glacier Deep Archive.
  • Provisioned IOPS: A feature allowing users to specify a specific level of I/O performance. Example: A high-traffic SQL database requiring 50,000 IOPS on an io2 volume.

Worked Examples

Example 1: Big Data ETL Job

Scenario: You are running a Hadoop-based big data workload that processes large log files sequentially. Cost is a primary concern. Solution: Use Throughput Optimized HDD (st1). Reasoning: st1 volumes are designed for throughput-intensive workloads with sequential access patterns at a lower cost than SSDs.

Example 2: Hybrid Backup Solution

Scenario: A company has an on-premises backup application that uses physical tapes. They want to move to the cloud without changing their application. Solution: Implement AWS Storage Gateway - Tape Gateway. Reasoning: Tape Gateway presents a virtual tape library (VTL) to the existing application via iSCSI, while backing up data to S3 and Glacier.

Checkpoint Questions

  1. Which EBS volume type is most cost-effective for a large data warehouse with sequential access? (Answer: st1)
  2. What is the main difference between S3 Standard and S3 Standard-IA? (Answer: IA has a lower storage cost but charges a retrieval fee per GB).
  3. Which service would you use to migrate 50TB of data from an on-premises NAS to Amazon FSx for Lustre over the internet? (Answer: AWS DataSync).
  4. How does a Cached Volume Gateway differ from a Stored Volume Gateway? (Answer: Cached keeps only frequent data locally; Stored keeps the full dataset locally).

Muddy Points & Cross-Refs

  • EBS vs. Instance Store: Remember that EBS is persistent network storage, while Instance Store is local physical storage that is lost if the instance is stopped.
  • S3 vs. EFS: Use S3 for web-accessible objects and massive scale; use EFS when you need a standard Linux filesystem that multiple EC2 instances can mount simultaneously.
  • Deep Dive: For more on performance, see the Compute and Networking guide regarding Enhanced Networking and its impact on EBS throughput.

Comparison Tables

Amazon EBS Volume Types

Volume TypeAbbreviationPrimary Use CaseMax Throughput
General Purpose SSDgp3Most workloads, boot volumes1,000 MB/s
Provisioned IOPS SSDio2High-perf databases4,000 MB/s
Throughput Optimized HDDst1Big data, ETL, logs500 MB/s
Cold HDDsc1Low-cost archival250 MB/s

Amazon S3 Storage Classes

ClassMin DurationAvailabilityRetrieval Fee
StandardNone99.99%No
Standard-IA30 days99.9%Yes
One Zone-IA30 days99.5%Yes
Glacier Instant90 days99.9%Yes

Ready to study AWS Certified Solutions Architect - Professional (SAP-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free