Domain 4.1: Designing Cost-Optimized Storage Solutions
Design cost-optimized storage solutions
Domain 4.1: Designing Cost-Optimized Storage Solutions
This guide focuses on selecting the most cost-effective AWS storage services based on access patterns, performance requirements, and data durability needs. In the AWS Certified Solutions Architect - Associate (SAA-C03) exam, cost-optimization accounts for 20% of the total score.
Learning Objectives
After studying this guide, you should be able to:
- Identify the most cost-effective Amazon S3 storage class for specific access patterns.
- Compare costs between Amazon EBS volume types (SSD vs. HDD).
- Implement S3 Lifecycle policies to automate data tiering.
- Determine when to use Amazon EFS Infrequent Access (IA) versus Standard storage.
- Select appropriate data transfer tools (DataSync, Snowball) to minimize migration costs.
Key Terms & Glossary
- Object Storage (S3): Highly scalable storage for unstructured data accessed via a flat address space.
- Block Storage (EBS): High-performance storage volumes used with Amazon EC2; behaves like a physical hard drive.
- File Storage (EFS/FSx): Shared storage accessible by multiple instances simultaneously via standard file protocols (NFS/SMB).
- Lifecycle Policy: A set of rules that automatically transitions objects to less expensive storage classes or deletes them after a specified period.
- Cold Storage: Data that is rarely accessed but must be retained for long periods, typically stored in Amazon S3 Glacier.
The "Big Idea"
Cost optimization in AWS storage is not simply about choosing the cheapest service; it is about matching the storage tier to the data's lifecycle. As data ages, its value and access frequency typically decrease. A cost-optimized architecture uses automation (like S3 Lifecycle policies or EFS Intelligent-Tiering) to move data to cheaper, "colder" tiers without human intervention, ensuring you only pay for the performance you actually need.
Formula / Concept Box
| Feature | Amazon S3 | Amazon EBS | Amazon EFS |
|---|---|---|---|
| Storage Type | Object | Block | File (Network) |
| Cost Model | GB/Month + Requests | GB/Month (Provisioned) | GB/Month (Used) |
| Best For | Static assets, Backups | OS boot volumes, DBs | Shared dev tools, CMS |
| Cost-Saving Key | Storage Classes | Volume Types (GP3 vs IO2) | Lifecycle Management |
[!TIP] Always look for "GP3" in exam questions involving EBS. It is generally 20% cheaper than GP2 and allows you to provision throughput and IOPS independently.
Hierarchical Outline
- I. Amazon S3 Cost Optimization
- Storage Classes:
- S3 Standard: Frequent access; highest cost.
- S3 Standard-IA: Lower storage cost, but retrieval fees apply. Best for backups.
- S3 Glacier Deep Archive: Lowest cost ($0.00099/GB); retrieval time of 12-48 hours.
- Lifecycle Management: Transitioning data from Standard to Glacier based on age.
- Storage Classes:
- II. Amazon EBS (Elastic Block Store) Optimization
- SSD-backed (gp3, io2): For transactional workloads.
- HDD-backed (st1, sc1): For large streaming workloads (st1) or cold data (sc1).
- Right-sizing: Changing volume types using Elastic Volumes without downtime.
- III. Amazon EFS (Elastic File System)
- Storage Classes: Standard vs. Infrequent Access (IA).
- EFS Lifecycle Management: Automatically moves files not accessed for 30 days to IA tier.
- IV. Data Transfer and Ingestion
- AWS DataSync: Automates moving data to S3, EFS, or FSx.
- AWS Snowball Edge: Physical device for large-scale data migration (petabyte scale) to avoid high internet egress costs.
Visual Anchors
S3 Lifecycle Flow
Cost vs. Performance Mapping
\begin{tikzpicture} \draw[->] (0,0) -- (6,0) node[right] {Access Frequency (Performance)}; \draw[->] (0,0) -- (0,5) node[above] {Cost per GB};
% S3 Glacier
\filldraw[blue] (0.5,0.5) circle (2pt) node[anchor=west] {S3 Glacier};
% S3 Standard-IA
\filldraw[green] (3,2) circle (2pt) node[anchor=west] {S3 Standard-IA};
% S3 Standard
\filldraw[red] (5,4.5) circle (2pt) node[anchor=west] {S3 Standard};
\draw[dashed] (0.5,0.5) -- (5,4.5);\end{tikzpicture}
Definition-Example Pairs
- Requester Pays: A bucket configuration where the person downloading the data pays the data transfer cost rather than the bucket owner.
- Example: A scientific research group sharing a massive dataset of genomic data with the public.
- Intelligent-Tiering: An S3 storage class that automatically moves data between frequent and infrequent access tiers based on usage patterns.
- Example: A dynamic website where some images are viral (frequent) and others are forgotten (infrequent), but patterns change weekly.
- Elastic Volumes: An EBS feature that allows you to increase volume size or change volume types while the volume is in use.
- Example: Upgrading an EBS volume from an HDD (st1) to an SSD (gp3) during peak shopping season to handle higher disk I/O.
Worked Examples
Example 1: Archiving Medical Records
Scenario: A hospital needs to store patient X-rays for 10 years to meet regulatory requirements. The images are accessed frequently for the first 30 days, rarely for the next year, and almost never after that.
Step-by-Step Breakdown:
- Identify Tiers: Start in S3 Standard for the first 30 days to ensure high performance during the active treatment phase.
- Transition 1: Create a Lifecycle rule to move data to S3 Standard-IA after 30 days. This lowers the storage cost while keeping the data available in milliseconds.
- Transition 2: Move data to S3 Glacier Deep Archive after 1 year. The storage cost drops to the minimum level.
- Final Action: Set an expiration rule to delete the objects after 3,650 days (10 years).
Example 2: High-Performance Database Volume
Scenario: A company is running a SQL database on EC2. The database currently uses an io1 volume with 10,000 IOPS, but the cost is exceeding the budget. Analysis shows the database only hits 10,000 IOPS during a 2-hour nightly batch job.
Optimization: Switch the volume type to gp3. Provision the baseline IOPS needed for the day, and use the cost savings to pay only for the additional performance required, or scale the IOPS up just before the batch job starts using a Lambda function and the ModifyVolume API.
Checkpoint Questions
- Which S3 storage class has no retrieval fee but a higher storage cost than Standard-IA?
- What is the most cost-effective EBS volume type for a large, sequential log processing workload?
- True or False: S3 One Zone-IA is appropriate for critical, non-reproducible data.
- How long must an object stay in S3 Standard-IA before you avoid a pro-rated minimum storage charge?
- Which tool should you use to migrate 500 TB of data from an on-premises data center to S3 if you have limited internet bandwidth?
▶Click to see answers
- S3 Standard.
- Throughput Optimized HDD (st1).
- False (it only stores data in one AZ; use it for reproducible data like thumbnails).
- 30 days.
- AWS Snowball Edge.