AWS Storage Services: S3, EBS, EFS, and FSx Study Guide
Storage services with appropriate use cases (for example, Amazon S3, Amazon EFS, Amazon EBS)
AWS Storage Services: S3, EBS, EFS, and FSx
This guide covers the core storage options provided by AWS, focusing on their unique characteristics, performance metrics, and appropriate use cases for the SAA-C03 exam.
Learning Objectives
- Differentiate between Object, Block, and File storage types.
- Select the appropriate storage service based on access patterns (single vs. multi-instance).
- Evaluate cost-optimization strategies using lifecycle policies and storage tiering.
- Identify hybrid storage solutions for connecting on-premises data to the AWS Cloud.
Key Terms & Glossary
- Object Storage: Data stored as distinct units (objects) with metadata and a unique identifier; highly scalable and accessed via API/HTTP.
- Block Storage: Data broken into fixed-size blocks; acts like a physical hard drive directly attached to a server (EC2).
- File Storage (NFS/SMB): Hierarchical data storage accessible by multiple clients simultaneously over a network.
- IOPS (Input/Output Operations Per Second): A performance metric for block storage measuring how many read/write operations can occur per second.
- Throughput: The amount of data transferred per second (e.g., MiB/s), critical for large, sequential data transfers.
The "Big Idea"
In AWS, storage selection is a balancing act between Access Pattern and Performance Requirements. If you need shared access across many Linux servers, you use EFS. If you need high-performance, low-latency access for a single database server, you use EBS. If you need to store virtually unlimited images or web assets for a global audience, you use S3. Choosing wrong leads to either performance bottlenecks or unnecessary costs.
Formula / Concept Box
| Feature | Amazon S3 | Amazon EBS | Amazon EFS |
|---|---|---|---|
| Storage Type | Object | Block | File (NFS) |
| Best Used For | Web assets, backups, data lakes | OS drives, DB volumes | Shared home dirs, CMS |
| Access | Web (HTTP/HTTPS) | Attached to 1 EC2 Instance* | Shared (1000s of EC2s) |
| Scope | Regional | Availability Zone | Regional |
| Durability | 99.999999999% (11 9s) | 99.8% - 99.9% | 99.999999999% (11 9s) |
[!NOTE] Note: EBS Multi-Attach is available for specific Provisioned IOPS volumes, but standard EBS is AZ-locked and single-instance.
Hierarchical Outline
- Object Storage (Amazon S3)
- Buckets & Objects: Global namespace for buckets; objects up to 5TB.
- Storage Tiers: Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier (Instant, Flexible, Deep Archive).
- Lifecycle Management: Automated transitions between tiers to save costs.
- Block Storage (Amazon EBS)
- SSD-backed: gp2/gp3 (General Purpose), io1/io2 (Provisioned IOPS).
- HDD-backed: st1 (Throughput Optimized), sc1 (Cold HDD).
- Snapshots: Point-in-time backups stored in S3.
- File Storage (EFS & FSx)
- Amazon EFS: Managed NFS for Linux; scales automatically.
- Amazon FSx for Windows: Fully managed native Windows file system (SMB).
- Amazon FSx for Lustre: High-performance for HPC and machine learning.
- Hybrid & Migration
- AWS Storage Gateway: Bridges on-prem to cloud (File, Volume, and Tape gateways).
- AWS DataSync: High-speed data transfer service.
Visual Anchors
Storage Decision Logic
Block vs. File Connectivity
\begin{tikzpicture}[scale=0.8, every node/.style={transform shape}] % Draw Instances \draw[fill=blue!10] (0,0) rectangle (2,1.5) node[midway] {EC2 A}; \draw[fill=blue!10] (4,0) rectangle (6,1.5) node[midway] {EC2 B};
% EBS Volume (Attached to one) \draw[fill=gray!20] (0,-2) ellipse (1 and 0.5) node {EBS Vol}; \draw[thick, ->] (1,0) -- (1,-1.5); \node at (1.5,-1) {\tiny 1:1};
% EFS (Shared) \draw[fill=green!20] (3,-3) rectangle (5,-2) node[midway] {EFS}; \draw[thick, ->] (1,0) -- (3,-2.5); \draw[thick, ->] (5,0) -- (5,-2.5); \node at (3.5,-1.2) {\tiny Shared}; \end{tikzpicture}
Definition-Example Pairs
-
Term: Amazon S3 Lifecycle Policy
- Definition: A set of rules that automatically transitions objects to less expensive storage classes or deletes them after a set period.
- Example: A company stores raw logs in S3 Standard for 30 days, moves them to S3 Glacier for 7 years for compliance, and then automatically deletes them.
-
Term: EBS Snapshot
- Definition: An incremental backup of an EBS volume, capturing only the blocks that have changed since the last snapshot.
- Example: Before performing a risky OS upgrade on an EC2 instance, an admin takes a snapshot so they can revert the disk to its exact previous state if the upgrade fails.
-
Term: Amazon FSx for Lustre
- Definition: A high-performance file system optimized for fast processing of workloads like machine learning and high-performance computing (HPC).
- Example: A research lab uses FSx for Lustre to feed thousands of images per second from S3 into a GPU-based training cluster for autonomous vehicle AI.
Worked Examples
Example 1: Cost-Optimizing a Large Image Repository
Scenario: A social media app stores 50PB of user photos. Photos are accessed frequently for the first 30 days, then rarely accessed, but must be available instantly if requested.
- Step 1: Store new uploads in S3 Standard.
- Step 2: Configure a Lifecycle Policy to transition objects to S3 Standard-Infrequent Access (IA) after 30 days.
- Step 3: (Alternative) Enable S3 Intelligent-Tiering to allow AWS to automatically move objects between tiers based on changing access patterns without manual management.
Example 2: Selecting Block Storage for a Database
Scenario: You are migrating a high-traffic MySQL database to EC2. The database requires a consistent 15,000 IOPS and sub-millisecond latency.
- Analysis:
gp2provides 3 IOPS per GB, so a very large volume would be needed to hit 15k.gp3provides a baseline 3,000 IOPS and can be provisioned higher, but might hit limits.io2(Provisioned IOPS) is designed for this workload.
- Solution: Use EBS io2 volumes and provision exactly 15,000 IOPS to ensure performance regardless of volume size.
Checkpoint Questions
- Which storage service would you use to provide a shared file system for a fleet of Windows-based web servers?
- True or False: S3 is automatically encrypted at rest.
- Which EBS volume type is most cost-effective for large, sequential logging workloads that do not require high IOPS?
- What is the difference between a File Gateway and a Volume Gateway in AWS Storage Gateway?
- How can you ensure that data in an EBS volume is highly available across multiple Availability Zones?
▶Click to see answers
- Amazon FSx for Windows File Server (EFS is Linux-only).
- True (As per recent AWS updates and source material).
- st1 (Throughput Optimized HDD).
- File Gateway provides an NFS/SMB interface to S3; Volume Gateway provides iSCSI block storage backed by S3.
- EBS is AZ-specific. To achieve multi-AZ availability, you must take Snapshots and restore them in a different AZ, or use an application-level replication strategy.