Mastering AWS Storage Sizing: Capacity and Performance Engineering

This guide focuses on the critical task of determining the correct storage size for AWS workloads, emphasizing that "size" in the cloud refers to both capacity (GB/TB) and performance (IOPS/Throughput).

Learning Objectives

After studying this guide, you will be able to:

Distinguish between capacity-oriented and performance-oriented sizing requirements.
Calculate the baseline IOPS for General Purpose SSD (gp2) volumes.
Determine the required IOPS for a specific throughput target based on database page sizes.
Identify the limitations of burstable performance and when to scale volume size for performance reasons.
Understand the relationship between EC2 instance limits and EBS volume limits.

Key Terms & Glossary

IOPS (Input/Output Operations Per Second): A measure of the number of reads and writes performed every second. It is the primary metric for block storage performance.
Throughput: The amount of data (usually measured in MiB/s) that can be transferred to or from a volume in a given time.
Page Size: The fixed-size block of data that a database engine uses to manage data. For example, MySQL uses 16 KB pages.
gp2 (General Purpose SSD): A balanced EBS volume type where performance is directly tied to the size of the volume.
Burst Credits: A mechanism for volumes smaller than 1,000 GB to temporarily exceed their baseline performance (up to 3,000 IOPS).

The "Big Idea"

In traditional environments, you buy a disk for its capacity. In AWS, Performance is a function of Size. To get more speed (IOPS), you often have to provision more space (GB), even if you don't need the storage. Sizing a workload correctly requires balancing the cost of capacity against the technical necessity of performance throughput.

Formula / Concept Box

Concept	Formula / Rule	Notes
gp2 Baseline IOPS	$ $3 \times$ Volume Size in GB$	Min: 100 IOPS; Max: 16,000 IOPS
Required IOPS	$\frac{Throughput (KB/s)}{Page Size (KB)}$	Larger page sizes require fewer IOPS for the same throughput
Throughput (MB/s)	$IOPS$ \times I/O$ Size	Must stay within both Volume and $Instance limits
RDS gp2 Max	$5 $,$ 334 \text{ GB}$	The size at which you hit the 16,000 IOPS ceiling

Hierarchical Outline

I. Understanding Performance Metrics
- A. IOPS vs. Throughput: IOPS measures the "frequency" of actions; Throughput measures the "volume" of data.
- B. Page Size Impact: Database engines determine I/O size (e.g., Oracle/SQL Server = 8 KB; MySQL/MariaDB = 16 KB).
II. Sizing for Amazon EBS and RDS
- A. Baseline Performance: Volumes earn 3 IOPS per GB.
- B. Burstable Performance: Volumes < 1 TB can burst to 3,000 IOPS using a credit balance of 5.4 million credits.
- C. Throughput Limits: Maximum throughput for gp2 is 250 MB/s, but requires a large enough volume and a Nitro-based instance.
III. Right-Sizing Strategies
- A. Monitoring: Using CloudWatch to measure actual usage over days/weeks.
- B. Elasticity: Resizing EBS volumes or changing types (gp2 to gp3 or io2) without downtime.
- C. Vertical Scaling: Resizing the EC2 instance to ensure its network/disk bandwidth doesn't bottleneck the storage.

Visual Anchors

Storage Selection Logic

Loading Diagram...

Linear Performance of gp2

\begin{center}

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

\end{center}

Definition-Example Pairs

Term: IOPS/Throughput Inversion
Definition: The principle that increasing the size of each I/O operation (page size) decreases the number of operations needed to move a specific amount of data.
Example: To move 100 MB of data, a MySQL database (16 KB page) needs 6,400 IOPS, whereas a PostgreSQL database (8 KB page) needs 12,800 IOPS.
Term: Storage Elasticity
Definition: The ability to modify volume size, performance, and type dynamically without detaching the volume.
Example: A company notices a performance bottleneck during Black Friday and increases an EBS gp2 volume from 500 GB to 2,000 GB via the AWS Console to quadrupling baseline IOPS instantly.

Worked Examples

Example 1: Calculating Baseline Performance

Scenario: You provision a 400 GB gp2 volume for an RDS instance running MariaDB.

Question: What is the baseline IOPS?
Calculation: $400 GB $\times 3$ IOPS/GB = 1,200 IOPS$.
Result: The volume has a steady-state performance of 1,200 IOPS but can burst to 3,000 IOPS as long as credits are available.

Example 2: Determining Size for Throughput

Scenario: A workload requires 200 MiB/s of disk throughput on an Oracle database (8 KB page size).

Convert Throughput to KB: $200 \times 1024 = 204,800 \text{ KB/s}$.
Calculate Required IOPS: $204,800 KB/s / 8 KB per page = 25,600 IOPS$.
Selection: Since the max gp2 IOPS is 16,000, gp2 is insufficient. You must use Provisioned IOPS (io2) or gp3 where IOPS are independent of size.

Checkpoint Questions

What is the minimum number of IOPS any EBS gp2 volume starts with?
If you have a 2,000 GB gp2 volume, can it burst to 3,000 IOPS? (Hint: Check the 1 TB rule).
You have plenty of storage space but are seeing high latency. Why might increasing the volume size solve this?
Why does a Nitro-based instance matter when designing for 250 MB/s throughput?

[!TIP] If a question asks for the "most cost-effective" way to increase IOPS for a volume larger than 1 TB, simply increasing the volume size is usually the answer for gp2, but always check if switching to gp3 is mentioned, as gp3 allows you to provision IOPS separately from storage capacity.

[!WARNING] Always ensure your EC2 instance family (e.g., m5, r5) supports the maximum throughput of your volume. Even a 64 TB volume cannot exceed the hardware throughput limit of the instance it is attached to.