Storage Access Patterns and Architectures (AWS SAA-C03)

[!NOTE] Understanding the difference between block, file, and object storage is fundamental to passing the AWS Solutions Architect - Associate exam. Each has distinct access patterns that dictate its performance and cost-effectiveness.

Learning Objectives

After studying this guide, you should be able to:

Differentiate between block, file, and object storage types.
Identify the correct AWS service (EBS, EFS, S3, FSx) for specific application access patterns.
Implement performance optimizations like RAID for block storage and prefixes for object storage.
Design cost-optimized storage strategies using lifecycle policies and storage tiering.

Key Terms & Glossary

Block Storage: Data is stored in fixed-size blocks; accessed directly by the operating system (e.g., EBS).
Object Storage: Data is stored as objects in a flat hierarchy with unique identifiers and metadata (e.g., S3).
IOPS (Input/Output Operations Per Second): A performance metric measuring how many read/write operations a storage device can perform per second.
Throughput: The amount of data transferred to/from the storage device in a given time (measured in MB/s).
Prefix: A string of characters at the beginning of an S3 object name used to organize objects into a folder-like structure.
Delimiter: A character (usually /) used to group S3 keys into a hierarchy.

The "Big Idea"

Storage in AWS is not just a "virtual hard drive." It is a strategic choice. Applications requiring sub-millisecond latency for databases need Block Storage. Applications sharing files across multiple instances need File Storage. For massive scalability, web assets, and data lakes, Object Storage is the "flat surface" that provides unlimited capacity and global accessibility. Choosing the wrong pattern leads to either a performance bottleneck or an unnecessarily high bill.

Formula / Concept Box

Storage Type	AWS Service	Primary Access Pattern	Typical Use Case
Block	Amazon EBS	Low-latency, OS-level I/O	Databases, OS Boot Volumes
File	Amazon EFS / FSx	Distributed File System (NFS/SMB)	Shared Media, Home Directories
Object	Amazon S3	API / HTTP (REST)	Static Web, Data Lakes, Backups

Visual Anchors

Storage Classification Flow

Loading Diagram...

RAID Conceptualization

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Hierarchical Outline

Block Storage (Amazon EBS)
- Architecture: Attached to single EC2 instances (unless using Multi-Attach).
- Filesystems: Managed by the guest OS (NTFS, ext4).
- Optimization:
  - RAID 0: Strips data across volumes to bypass single-disk I/O limits. Use for databases.
  - RAID 1: Mirrors data. Note: AWS recommends avoiding RAID 1 on EBS as EBS is already replicated internally; it consumes extra bandwidth.
Object Storage (Amazon S3)
- Architecture: Flat structure, globally unique bucket names.
- Naming Patterns: Use prefixes (/images/) and delimiters to simulate folders.
- Lifecycle Management: Transitions objects from S3 Standard to IA (Infrequent Access) or Glacier to optimize cost.
File Storage (EFS & FSx)
- EFS: Managed NFS for Linux; scales automatically.
- FSx for Windows: Native SMB support for Windows environments.
- FSx for Lustre: High-performance storage for compute-heavy workloads (HPC).

Definition-Example Pairs

Object Metadata: Data describing the object (up to 2KB).
- Example: A photo uploaded to S3 might have metadata tags like Resolution: 4K or Author: JohnDoe used for automated processing.
S3 Access Points: Named network endpoints attached to buckets for managing data access at scale.
- Example: Creating different access points for a "Finance" team (Read-Only) and a "Dev" team (Read-Write) on the same underlying S3 bucket.
Data Lifecycle: The automated transition of data between storage tiers.
- Example: Automatically moving server logs from S3 Standard to S3 Glacier Deep Archive after 90 days to save 90% on storage costs.

Worked Examples

Example 1: High-Performance Database

Scenario: A company runs a MySQL database on EC2 that is hitting I/O limits during peak hours. They need maximum throughput. Solution: Create two EBS volumes and configure RAID 0 within the Linux OS. This spreads the I/O load across both volumes, effectively doubling the available throughput and IOPS for the database files.

Example 2: Secure Public Assets

Scenario: An application stores user profile pictures. They want to ensure the pictures are only accessible for 10 minutes after a user logs in. Solution: Use S3 Pre-signed URLs. The application generates a temporary URL that grants time-limited access to the object without making the bucket public.

Checkpoint Questions

Which storage type is best described as a "flat surface" for data?
True or False: RAID 1 is the recommended way to increase performance on EBS volumes.
What S3 feature would you use to move data to a cheaper tier after it hasn't been accessed for 30 days?
How do S3 and EBS differ in terms of network access (how do you reach the data)?
Which FSx type is specifically designed for high-performance computing (HPC)?

▶Click to see answers

Object Storage (Amazon S3).
False. RAID 0 increases performance (striping); RAID 1 is for mirroring and is discouraged on EBS.
S3 Lifecycle Policies.
S3 is accessed via API/HTTP (over the internet or VPC endpoints), while EBS is accessed as a block device attached to a specific EC2 instance.
FSx for Lustre.