AWS Storage Service Selection: Comprehensive Study Guide (SAP-C02)
Selecting the appropriate storage service
AWS Storage Service Selection: Comprehensive Study Guide
Choosing the right storage service is a critical pillar of the AWS Well-Architected Framework, specifically for performance efficiency and cost optimization. In the SAP-C02 exam, you must distinguish between block, file, and object storage based on access protocols, latency requirements, and scaling capabilities.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between Block-level, File-level, and Object-level storage.
- Identify the specific use cases for Amazon EBS volume types (gp3, io2, st1, sc1).
- Select the appropriate Amazon FSx flavor based on protocol (SMB, NFS, Lustre, ZFS).
- Determine when to use Amazon EFS over EBS for shared Linux workloads.
- Analyze storage requirements based on throughput, IOPS, and regional availability.
Key Terms & Glossary
- IOPS (Input/Output Operations Per Second): A measure of how many small read/write operations a drive can handle. Crucial for databases.
- Throughput: The amount of data transferred per second (MB/s). Crucial for streaming and big data.
- Block Storage: Data is stored in fixed-sized chunks (blocks); accessed by an OS as a raw disk volume (e.g., EBS).
- File Storage: Data is organized in a hierarchy of files and folders; accessed via protocols like NFS or SMB (e.g., EFS, FSx).
- Object Storage: Data is stored as objects with metadata and a unique ID; accessed via APIs/HTTP (e.g., S3).
- NFS (Network File System): A protocol primarily for Linux-based shared file access.
- SMB (Server Message Block): A protocol primarily for Windows-based shared file access.
The "Big Idea"
Selecting storage is a trade-off between latency, cost, and access patterns. You don't just pick "a disk"; you pick a service that matches the application's DNA. If the app needs a local hard drive for a database, use EBS. If multiple Linux servers need to share a config file, use EFS. If you are migrating a specialized on-premises high-performance cluster, you look at FSx.
Formula / Concept Box
| Feature | EBS | EFS | S3 | FSx (ONTAP/Windows) |
|---|---|---|---|---|
| Access Type | Block | File (NFS) | Object (API) | File (SMB/NFS/iSCSI) |
| Scope | Availability Zone | Regional | Regional/Global | AZ or Multi-AZ |
| Best For | Boot volumes, DBs | Linux Shared Folders | Media, Backups, Data Lakes | Windows/Specialized Apps |
| Max Throughput | Up to 1,000 MB/s (gp3) | 10 GB/s+ | 3,500/5,500 TPS per prefix | 10 GB/s+ |
Hierarchical Outline
- Block Storage (Amazon EBS)
- SSD-backed:
- gp2/gp3: General purpose; balanced price/perf.
- io1/io2: Provisioned IOPS; for sub-millisecond latency and mission-critical DBs.
- HDD-backed:
- st1: Throughput optimized; for big data/log processing.
- sc1: Cold storage; for infrequent access at lowest cost.
- SSD-backed:
- File Storage (EFS & FSx)
- EFS: Managed NFS for Linux. Scales automatically.
- FSx for Windows: Fully managed Windows File Server (SMB).
- FSx for Lustre: High-performance computing (HPC) and machine learning.
- FSx for NetApp ONTAP: Easy migration for existing NetApp users; supports multi-protocol.
- FSx for OpenZFS: High throughput (12 GB/s+) for analytics.
- Object Storage (Amazon S3)
- Scalable, durable storage for any data type.
Visual Anchors
Storage Selection Flowchart
Conceptual Architecture of Storage Layers
\begin{tikzpicture}[node distance=2cm] % Define styles \tikzstyle{box} = [rectangle, draw, minimum width=2.5cm, minimum height=1cm, text centered]
% Draw EBS (Block)
\draw[fill=blue!10] (0,0) rectangle (2.5,1) node[pos=.5] {EBS Block};
\draw[fill=blue!10] (0,1.1) rectangle (2.5,2.1) node[pos=.5] {EBS Block};
\node at (1.25, -0.5) {Block Storage (Disk)};
% Draw EFS (File)
\draw[fill=green!10] (4,0) -- (6,0) -- (6,2) -- (4,2) -- (4,0);
\draw (4,1) -- (6,1);
\draw (5,0) -- (5,2);
\node at (5, -0.5) {File (Hierarchy)};
% Draw S3 (Object)
\draw[fill=orange!10] (8,1) circle (1cm);
\node at (8,1) {Metadata + Data};
\node at (8, -0.5) {Object (Flat)};
% Connections
\node (EC2) at (4,4) [draw, fill=gray!20, minimum width=4cm] {Compute (EC2/Lambda/Containers)};
\draw[->, thick] (EC2) -- (1.25, 2.2);
\draw[->, thick] (EC2) -- (5, 2.1);
\draw[->, thick] (EC2) -- (8, 2.1);\end{tikzpicture}
Definition-Example Pairs
- EBS Multi-Attach: Allows an EBS volume to be attached to multiple EC2 instances in the same AZ.
- Example: A clustered application like Oracle RAC on Linux that requires shared block access.
- Throughput-Optimized HDD (st1): A low-cost volume designed for high throughput rather than high IOPS.
- Example: Storing large MapReduce datasets or Kafka logs where sequential read/write is primary.
- FSx for Lustre: A file system optimized for fast processing of large data sets.
- Example: Training a machine learning model using SageMaker where the data resides in S3 but needs to be "lazy-loaded" into a high-speed file system for compute.
Worked Examples
Scenario: Migrating a Windows-based Content Management System (CMS)
Problem: A company has a legacy CMS running on-premises that uses a Windows Shared Drive (SMB) to store images. They want to move to AWS with minimal code changes.
Solution:
- Analyze Protocol: The application uses SMB. This eliminates Amazon EFS (Linux/NFS only).
- Evaluate Managed Options: Amazon FSx for Windows File Server provides a fully managed SMB share compatible with Active Directory.
- Architecture: Deploy EC2 instances across two AZs for high availability. Use Amazon FSx for Windows (Multi-AZ deployment). The instances mount the file share using a DNS name, just like on-premises.
Scenario: High-Throughput Log Analytics
Problem: A Big Data application processes 500 MB/s of log data. Cost is a major concern, and data is accessed sequentially.
Solution: Use EBS st1 (Throughput Optimized HDD). While gp3 could handle this, st1 is significantly cheaper for sequential workloads that don't need the low latency of SSDs.
Checkpoint Questions
- Which storage service is the only one that supports the SMB, NFS, and iSCSI protocols simultaneously?
- You need to share a file system between a Linux EC2 instance in AZ-a and a Fargate container in AZ-b. Which service do you choose?
- What is the main advantage of EBS gp3 over gp2?
- True or False: Amazon FSx for Lustre is a persistent-only storage solution.
▶Click for Answers
- Amazon FSx for NetApp ONTAP.
- Amazon EFS (it is a regional service, unlike EBS).
- gp3 allows you to scale IOPS and Throughput independently of storage capacity.
- False. It can be used for temporary "scratch" storage or linked to S3 for persistence.
Muddy Points & Cross-Refs
- EBS vs. EFS for performance: Students often confuse these. Remember: EBS is for single-instance, low-latency performance (like a local SSD). EFS is for distributed, shared access (like a network drive).
- FSx Flavors:
- Windows: Think "Active Directory / SMB."
- Lustre: Think "HPC / S3 Integration / Fast Linux."
- NetApp: Think "On-premises migration / ONTAP features."
- OpenZFS: Think "High-throughput migration from ZFS servers."
Comparison Tables
EBS Volume Type Comparison
| Volume Type | Technology | Use Case | Max IOPS | Max Throughput |
|---|---|---|---|---|
| gp3 | SSD | General Purpose (Virtual Desktops, Dev/Test) | 16,000 | 1,000 MB/s |
| io2 | SSD | Critical DBs (SAP HANA, Oracle) | 256,000 | 4,000 MB/s |
| st1 | HDD | MapReduce, Log Processing | 500 | 500 MB/s |
| sc1 | HDD | Cold data, lowest cost | 250 | 250 MB/s |
[!IMPORTANT] For the exam, if a question mentions "Lustre," look for "HPC" or "S3 integration." If it mentions "SMB" and "Linux," look for "FSx for NetApp ONTAP."