Comprehensive Study Guide: Storage Options on AWS
Storage options on AWS
Comprehensive Study Guide: Storage Options on AWS
Learning Objectives
After studying this guide, you should be able to:
- Distinguish between block-level, file-level, and object-level storage architectures.
- Select the appropriate Amazon EBS volume type based on IOPS and throughput requirements.
- Evaluate Amazon S3 storage classes to optimize for cost, durability, and access frequency.
- Design hybrid storage solutions using AWS Storage Gateway.
- Implement data migration strategies using AWS DataSync.
Key Terms & Glossary
- IOPS (Input/Output Operations Per Second): A measure of performance for block storage, representing how many read/write operations can occur in one second.
- Throughput: The amount of data moved from one place to another in a given time period (e.g., MB/s).
- Durability: The probability that a stored object will not be lost over a period of time (S3 offers 11 nines).
- WORM (Write Once, Read Many): A data storage device in which information, once written, cannot be modified (e.g., S3 Object Lock).
- iSCSI (Internet Small Computer Systems Interface): An IP-based storage networking standard for linking data storage facilities, used by Storage Gateway.
The "Big Idea"
Selecting storage on AWS is not a "one size fits all" decision. It is a multidimensional optimization problem where you must balance Access Patterns (random vs. sequential), Performance Requirements (latency, IOPS, throughput), and Cost Constraints. Modern cloud architectures often leverage a multi-tiered approach—using block storage for active databases, file storage for shared application data, and object storage for scalable, cost-effective long-term persistence.
Formula / Concept Box
| Pricing Factor | Amazon EBS | Amazon S3 |
|---|---|---|
| Storage | Per GB-month provisioned | Per GB-month consumed |
| Requests | Included (for most types) | Charged per 1,000 requests (PUT, GET, etc.) |
| Data Transfer | Outbound/Cross-Region charges | Outbound/Cross-Region charges |
| Persistence | Snapshots stored in S3 | Native versioning and replication |
Hierarchical Outline
- Block Storage (Amazon EBS)
- SSD-Backed: Focused on low latency and high IOPS (gp3, io2).
- HDD-Backed: Focused on large sequential throughput (st1, sc1).
- Snapshots: Incremental backups stored in Amazon S3.
- Object Storage (Amazon S3)
- Standard: Frequently accessed data.
- Infrequent Access (IA): Lower cost for older but still online data.
- Glacier: Archival storage with retrieval times from minutes to hours.
- File Storage (EFS & FSx)
- EFS: Managed NFS for Linux-based workloads.
- FSx: High-performance file systems for Windows, Lustre, NetApp ONTAP.
- Hybrid & Migration
- Storage Gateway: Connects on-premises apps to cloud storage via iSCSI/NFS/SMB.
- DataSync: High-speed online data transfer between storage systems.
Visual Anchors
Storage Selection Decision Tree
EBS Performance Spectrum
\begin{tikzpicture} \draw[thick, ->] (0,0) -- (8,0) node[anchor=north] {\mbox{Cost (Lower Higher)}}; \draw[thick, ->] (0,0) -- (0,5) node[anchor=east] {Performance (IOPS/Latency)}; \filldraw[blue] (1,0.5) circle (2pt) node[anchor=south] {sc1 (Cold HDD)}; \filldraw[blue] (3,1.5) circle (2pt) node[anchor=south] {st1 (Throughput HDD)}; \filldraw[red] (5,3) circle (2pt) node[anchor=south] {gp3 (General Purpose SSD)}; \filldraw[red] (7,4.5) circle (2pt) node[anchor=south] {io2 (Provisioned IOPS)}; \draw[dashed] (0,2.2) -- (8,2.2) node[anchor=west] {SSD/HDD Divide}; \end{tikzpicture}
Definition-Example Pairs
- Ephemeral Storage: Temporary storage that is deleted when the instance stops. Example: EC2 Instance Store used for temporary buffers and caches.
- Cold Data: Data that is rarely accessed but must be retained. Example: Compliance logs from three years ago stored in S3 Glacier Deep Archive.
- Provisioned IOPS: A feature allowing users to specify a specific level of I/O performance. Example: A high-traffic SQL database requiring 50,000 IOPS on an io2 volume.
Worked Examples
Example 1: Big Data ETL Job
Scenario: You are running a Hadoop-based big data workload that processes large log files sequentially. Cost is a primary concern. Solution: Use Throughput Optimized HDD (st1). Reasoning: st1 volumes are designed for throughput-intensive workloads with sequential access patterns at a lower cost than SSDs.
Example 2: Hybrid Backup Solution
Scenario: A company has an on-premises backup application that uses physical tapes. They want to move to the cloud without changing their application. Solution: Implement AWS Storage Gateway - Tape Gateway. Reasoning: Tape Gateway presents a virtual tape library (VTL) to the existing application via iSCSI, while backing up data to S3 and Glacier.
Checkpoint Questions
- Which EBS volume type is most cost-effective for a large data warehouse with sequential access? (Answer: st1)
- What is the main difference between S3 Standard and S3 Standard-IA? (Answer: IA has a lower storage cost but charges a retrieval fee per GB).
- Which service would you use to migrate 50TB of data from an on-premises NAS to Amazon FSx for Lustre over the internet? (Answer: AWS DataSync).
- How does a Cached Volume Gateway differ from a Stored Volume Gateway? (Answer: Cached keeps only frequent data locally; Stored keeps the full dataset locally).
Muddy Points & Cross-Refs
- EBS vs. Instance Store: Remember that EBS is persistent network storage, while Instance Store is local physical storage that is lost if the instance is stopped.
- S3 vs. EFS: Use S3 for web-accessible objects and massive scale; use EFS when you need a standard Linux filesystem that multiple EC2 instances can mount simultaneously.
- Deep Dive: For more on performance, see the Compute and Networking guide regarding Enhanced Networking and its impact on EBS throughput.
Comparison Tables
Amazon EBS Volume Types
| Volume Type | Abbreviation | Primary Use Case | Max Throughput |
|---|---|---|---|
| General Purpose SSD | gp3 | Most workloads, boot volumes | 1,000 MB/s |
| Provisioned IOPS SSD | io2 | High-perf databases | 4,000 MB/s |
| Throughput Optimized HDD | st1 | Big data, ETL, logs | 500 MB/s |
| Cold HDD | sc1 | Low-cost archival | 250 MB/s |
Amazon S3 Storage Classes
| Class | Min Duration | Availability | Retrieval Fee |
|---|---|---|---|
| Standard | None | 99.99% | No |
| Standard-IA | 30 days | 99.9% | Yes |
| One Zone-IA | 30 days | 99.5% | Yes |
| Glacier Instant | 90 days | 99.9% | Yes |