Study Guide925 words

Mastering Storage Tiering in AWS: Performance vs. Cost Optimization

Storage tiering

Mastering Storage Tiering in AWS: Performance vs. Cost Optimization

Storage tiering is the strategic process of moving data between different storage categories based on performance requirements, access frequency, and cost objectives. In the AWS ecosystem, this involves selecting the right volume types for EBS, storage classes for S3, and data pools for FSx.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between block, file, and object storage access patterns.
  • Select appropriate EBS volume types (SSD vs. HDD) based on IOPS and throughput needs.
  • Automate data lifecycle management in Amazon S3 to minimize costs.
  • Configure multi-tier storage in FSx for NetApp ONTAP using primary and capacity pools.
  • Identify the trade-offs between storage latency and retrieval costs.

Key Terms & Glossary

  • IOPS (Input/Output Operations Per Second): A measure of how many read/write operations a storage device can perform per second. Crucial for databases.
  • Throughput: The amount of data transferred over a specific period (e.g., MiB/s). Crucial for big data and log processing.
  • WORM (Write Once, Read Many): A data storage device in which information, once written, cannot be modified (e.g., S3 Object Lock).
  • Hot Data: Frequently accessed data requiring sub-millisecond latency.
  • Cold Data: Infrequently accessed data where cost savings are prioritized over immediate retrieval speed.

The "Big Idea"

[!IMPORTANT] The goal of storage tiering is Economic Efficiency without Performance Degradation.

Think of storage like a home: you keep your daily-use items on the kitchen counter (Hot Storage/SSD), seasonal decorations in the attic (Warm Storage/Infrequent Access), and old tax records in a remote storage unit (Cold Storage/Archival). AWS provides the automation tools to move these items between "rooms" as they age, ensuring you never pay for high-performance "counter space" for items you only need once a year.

Formula / Concept Box

ComponentCalculation Factor
Total Cost of Ownership (TCO)(Storage GB×Price)+Data Transfer Out+Retrieval Fees+Request Fees(\text{Storage GB} \times \text{Price}) + \text{Data Transfer Out} + \text{Retrieval Fees} + \text{Request Fees}
EBS PricingProvisioned Capacity (GB) + IOPS/Throughput (if provisioned) + Snapshots
S3 PricingStorage Class Rate + Requests (PUT/GET) + Data Retrieval + Transfer Out

Hierarchical Outline

  1. Block Storage (Amazon EBS)
    • SSD-Backed: gp3 (General Purpose), io2 (Provisioned IOPS for databases).
    • HDD-Backed: st1 (Throughput Optimized), sc1 (Cold HDD for legacy/archival).
  2. Object Storage (Amazon S3)
    • Tiers: Standard \rightarrow Standard-IA \rightarrow One Zone-IA \rightarrow Glacier Instant Retrieval \rightarrow Glacier Deep Archive.
    • Automation: S3 Lifecycle Policies for transition and expiration.
  3. File Storage (Amazon EFS & FSx)
    • EFS: Elastic throughput; IA storage class for cost savings.
    • FSx for NetApp ONTAP: Primary (SSD) vs. Capacity Pool (HDD/Object) auto-tiering.

Visual Anchors

S3 Lifecycle Transition Flow

Loading Diagram...

Performance vs. Cost Trade-off

\begin{tikzpicture}[scale=1.0] % Axes \draw[->] (0,0) -- (6,0) node[right] {Latency (ms)}; \draw[->] (0,0) -- (0,5) node[above] {Cost per GB};

code
% Data Points \filldraw[red] (0.5,4.5) circle (2pt) node[anchor=south west] {SSD (io2)}; \filldraw[orange] (1.5,3.0) circle (2pt) node[anchor=south west] {SSD (gp3)}; \filldraw[blue] (4.0,1.5) circle (2pt) node[anchor=south west] {HDD (st1)}; \filldraw[green] (5.5,0.5) circle (2pt) node[anchor=south west] {Glacier}; % Trendline \draw[dashed, gray] (0.5,4.5) .. controls (1,2) and (3,1) .. (5.5,0.5);

\end{tikzpicture}

Definition-Example Pairs

  • Provisioned IOPS (io2): Storage where you specify exactly how many I/O operations per second the volume provides.
    • Example: A high-traffic Oracle Database requiring 16,000 consistent IOPS to prevent application lag.
  • Throughput Optimized HDD (st1): Low-cost magnetic storage designed for large, sequential data sets.
    • Example: A Big Data/Hadoop cluster performing MapReduce jobs on terabytes of log files.
  • S3 Glacier Deep Archive: The lowest-cost storage class in AWS for long-term retention.
    • Example: Storing Compliance Records (e.g., medical images) that must be kept for 10 years but are rarely accessed.

Worked Examples

Scenario: Optimizing an Image Sharing App

Problem: A social app stores millions of images. 90% are never viewed after the first 30 days. Current cost is $5,000/month on S3 Standard.

Step-by-Step Solution:

  1. Analyze Access Patterns: Identify that "hot" period is 30 days.
  2. Create Lifecycle Policy:
    • Transition objects to S3 Standard-IA after 30 days (saves ~40% on storage).
    • Transition objects to S3 Glacier after 1 year for long-term backup.
  3. Implement S3 Intelligent-Tiering: If access patterns are unpredictable, enable this to let AWS move data automatically between frequent and infrequent tiers without manual lifecycle rules.
  4. Result: Storage costs drop to ~$2,200/month while maintaining availability.

Checkpoint Questions

  1. Which EBS volume type is most cost-effective for a streaming workload requiring high throughput but not low-latency random access?
  2. In FSx for NetApp ONTAP, what is the typical latency for the "Capacity Pool" tier?
  3. True/False: S3 Standard-IA has a minimum storage duration of 30 days for billing purposes.
  4. What is the main difference between gp3 and gp2 EBS volumes regarding throughput provisioning?
Click to see answers
  1. st1 (Throughput Optimized HDD).
  2. Tens of milliseconds (compared to sub-millisecond for Primary SSD).
  3. True.
  4. gp3 allows you to provision throughput and IOPS independently of storage size; gp2 scales performance based on the size of the volume.

Muddy Points & Cross-Refs

  • S3 IA vs. Glacier Instant Retrieval: Both offer millisecond access. The difference is the cost structure—Glacier Instant Retrieval is cheaper for storage but more expensive for data access/retrieval. Use Glacier IR for data accessed once or twice a year.
  • st1 vs. sc1: Both are HDDs. Use st1 for active workloads (Big Data). Use sc1 for "Cold" data that just needs to be online (File servers for old projects).
  • Cross-Ref: See the "Networking" chapter for Data Transfer Costs, as moving data between regions during tiering can incur significant fees.

Comparison Tables

EBS Volume Type Comparison

Volume TypeUse CaseLatencyMax Throughput
io2 Block ExpressCritical Databases (SAP HANA)Sub-ms4,000 MiB/s
gp3Virtual Desktops, Boot VolumesSingle-digit ms1,000 MiB/s
st1Data Warehousing, Log ProcessingSingle-digit ms500 MiB/s
sc1Cold Data, ArchivalTens of ms250 MiB/s

S3 Storage Class Comparison

ClassRetrieval TimeDurabilityMin Storage Duration
StandardMilliseconds99.999999999%N/A
Standard-IAMilliseconds99.999999999%30 Days
Glacier Flexible1 min - 12 hours99.999999999%90 Days
Glacier Deep Archive12 - 48 hours99.999999999%180 Days

Ready to study AWS Certified Solutions Architect - Professional (SAP-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free