Cost-Effective AWS Storage Selection Study Guide
Selecting the most cost-effective storage service for a workload
Cost-Effective AWS Storage Selection
This guide focuses on the critical skill of selecting the most economical storage solution in AWS while meeting performance and durability requirements, as required for the SAA-C03 exam.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between Object, Block, and File storage use cases and costs.
- Select the appropriate Amazon S3 Tier based on access frequency and retrieval speed.
- Choose the right EBS Volume Type (SSD vs. HDD) for specific performance needs.
- Identify tools like AWS Cost Explorer and S3 Lifecycle Policies to automate savings.
- Compare hybrid storage options (e.g., Storage Gateway, DataSync) for data migration.
Key Terms & Glossary
- Durability: The probability that a data object will not be lost (S3 offers 11 nines: 99.999999999%).
- IOPS (Input/Output Operations Per Second): A measure of performance for block storage (EBS).
- Throughput: The amount of data transferred per second, critical for large-file workloads.
- Cold Storage: Data that is rarely accessed but must be retained for long periods (e.g., S3 Glacier).
- Lifecycle Policy: A set of rules that automatically transitions data to cheaper storage tiers or deletes it after a set time.
The "Big Idea"
[!IMPORTANT] The "Big Idea" of cost-optimized storage is matching the Data Access Pattern to the Storage Class. You should never pay for millisecond access for data you only need once a year, nor should you use expensive SSDs for data that is only accessed sequentially in large blocks.
Formula / Concept Box
| Storage Category | Primary Service | Cost Driver | Best For |
|---|---|---|---|
| Object | Amazon S3 | GB/month + Requests + Data Transfer Out | Photos, Videos, Backups, Static Web |
| Block | Amazon EBS | GB provisioned/month + IOPS (for io2) | EC2 Boot volumes, Databases (low latency) |
| File | Amazon EFS / FSx | GB/month (Elastic) | Shared folders, Content Management |
Hierarchical Outline
- Object Storage (Amazon S3)
- S3 Standard: Frequent access; high cost/GB.
- S3 Standard-IA: Infrequent access; lower storage cost, but adds Retrieval Fees.
- S3 One Zone-IA: Same as IA but only in one AZ (20% cheaper, less resilient).
- S3 Glacier Instant Retrieval: Millisecond access for quarterly data.
- S3 Glacier Deep Archive: Lowest cost (approx. $0.00099/GB); 12-hour retrieval.
- Block Storage (Amazon EBS)
- SSD-based (gp3, io2): High IOPS, best for transactional databases.
- HDD-based (st1, sc1): Throughput-optimized (st1) or Cold (sc1); much cheaper for large, sequential data.
- File Storage (Shared Access)
- EFS: Serverless, grows automatically, supports Linux (NFS).
- FSx for Windows: High-performance Windows-native file shares (SMB).
- FSx for Lustre: High-performance computing (HPC).
Visual Anchors
Storage Selection Decision Tree
Cost vs. Retrieval Speed Trade-off
\begin{tikzpicture}[scale=0.8] \draw[->] (0,0) -- (6,0) node[right] {Retrieval Time}; \draw[->] (0,0) -- (0,6) node[above] {Cost per GB}; \filldraw[red] (0.5,5) circle (2pt) node[right] {S3 Standard (Fast/Pricey)}; \filldraw[orange] (2,3) circle (2pt) node[right] {S3 Standard-IA}; \filldraw[blue] (4,1.5) circle (2pt) node[right] {Glacier Instant}; \filldraw[green] (5.5,0.5) circle (2pt) node[below] {Glacier Deep Archive (Slow/Cheap)}; \draw[dashed] (0.5,5) -- (5.5,0.5); \end{tikzpicture}
Definition-Example Pairs
- Throughput-Optimized HDD (st1): Low-cost HDD designed for frequently accessed, throughput-intensive workloads.
- Example: Storing large MapReduce datasets or log processing streams where data is read in big chunks.
- Intelligent-Tiering: An S3 storage class that automatically moves data between two access tiers when patterns change.
- Example: A dataset where you don't know the access pattern, or it changes unpredictably over time.
- S3 Transfer Acceleration: Uses CloudFront's edge locations to speed up long-distance uploads to S3.
- Example: A global team in London and Tokyo uploading large media files to an S3 bucket located in US-East-1.
Worked Examples
Scenario 1: The Compliance Requirement
Problem: A law firm must store client records for 7 years. They are almost never accessed but must be retrievable within 24 hours if audited. Solution: S3 Glacier Deep Archive. It offers the lowest price point in AWS and the 12-48 hour retrieval window fits the audit requirement perfectly.
Scenario 2: High-Performance Database
Problem: A production MySQL database on EC2 is experiencing latency during peak hours. Budget is a concern, but performance is priority. Solution: Migrate from gp2 to gp3 EBS volumes. gp3 provides 3,000 IOPS baseline for free regardless of volume size and allows you to scale IOPS and Throughput independently of storage capacity, often resulting in 20% lower costs than gp2.
Checkpoint Questions
- Which S3 storage class is best for data that is recreated easily but requires low cost and infrequent access?
- What is the main cost difference between Amazon EFS and Amazon EBS?
- True or False: S3 Standard-IA has a minimum storage duration of 30 days.
- Which tool would you use to find EBS volumes that are underutilized to save money?
▶Click to see answers
- S3 One Zone-IA (The One Zone aspect reduces cost, and IA fits the infrequent access).
- EFS is elastic and you pay only for what you use; EBS requires you to provision (and pay for) a specific size regardless of how much data is in it.
- True. You are billed for at least 30 days even if you delete the data sooner.
- AWS Compute Optimizer (or Trusted Advisor).