Storage Access Patterns and Architectures (AWS SAA-C03)
Storage access patterns
Storage Access Patterns and Architectures (AWS SAA-C03)
[!NOTE] Understanding the difference between block, file, and object storage is fundamental to passing the AWS Solutions Architect - Associate exam. Each has distinct access patterns that dictate its performance and cost-effectiveness.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between block, file, and object storage types.
- Identify the correct AWS service (EBS, EFS, S3, FSx) for specific application access patterns.
- Implement performance optimizations like RAID for block storage and prefixes for object storage.
- Design cost-optimized storage strategies using lifecycle policies and storage tiering.
Key Terms & Glossary
- Block Storage: Data is stored in fixed-size blocks; accessed directly by the operating system (e.g., EBS).
- Object Storage: Data is stored as objects in a flat hierarchy with unique identifiers and metadata (e.g., S3).
- IOPS (Input/Output Operations Per Second): A performance metric measuring how many read/write operations a storage device can perform per second.
- Throughput: The amount of data transferred to/from the storage device in a given time (measured in MB/s).
- Prefix: A string of characters at the beginning of an S3 object name used to organize objects into a folder-like structure.
- Delimiter: A character (usually
/) used to group S3 keys into a hierarchy.
The "Big Idea"
Storage in AWS is not just a "virtual hard drive." It is a strategic choice. Applications requiring sub-millisecond latency for databases need Block Storage. Applications sharing files across multiple instances need File Storage. For massive scalability, web assets, and data lakes, Object Storage is the "flat surface" that provides unlimited capacity and global accessibility. Choosing the wrong pattern leads to either a performance bottleneck or an unnecessarily high bill.
Formula / Concept Box
| Storage Type | AWS Service | Primary Access Pattern | Typical Use Case |
|---|---|---|---|
| Block | Amazon EBS | Low-latency, OS-level I/O | Databases, OS Boot Volumes |
| File | Amazon EFS / FSx | Distributed File System (NFS/SMB) | Shared Media, Home Directories |
| Object | Amazon S3 | API / HTTP (REST) | Static Web, Data Lakes, Backups |
Visual Anchors
Storage Classification Flow
RAID Conceptualization
Hierarchical Outline
- Block Storage (Amazon EBS)
- Architecture: Attached to single EC2 instances (unless using Multi-Attach).
- Filesystems: Managed by the guest OS (NTFS, ext4).
- Optimization:
- RAID 0: Strips data across volumes to bypass single-disk I/O limits. Use for databases.
- RAID 1: Mirrors data. Note: AWS recommends avoiding RAID 1 on EBS as EBS is already replicated internally; it consumes extra bandwidth.
- Object Storage (Amazon S3)
- Architecture: Flat structure, globally unique bucket names.
- Naming Patterns: Use prefixes (
/images/) and delimiters to simulate folders. - Lifecycle Management: Transitions objects from S3 Standard to IA (Infrequent Access) or Glacier to optimize cost.
- File Storage (EFS & FSx)
- EFS: Managed NFS for Linux; scales automatically.
- FSx for Windows: Native SMB support for Windows environments.
- FSx for Lustre: High-performance storage for compute-heavy workloads (HPC).
Definition-Example Pairs
- Object Metadata: Data describing the object (up to 2KB).
- Example: A photo uploaded to S3 might have metadata tags like
Resolution: 4KorAuthor: JohnDoeused for automated processing.
- Example: A photo uploaded to S3 might have metadata tags like
- S3 Access Points: Named network endpoints attached to buckets for managing data access at scale.
- Example: Creating different access points for a "Finance" team (Read-Only) and a "Dev" team (Read-Write) on the same underlying S3 bucket.
- Data Lifecycle: The automated transition of data between storage tiers.
- Example: Automatically moving server logs from S3 Standard to S3 Glacier Deep Archive after 90 days to save 90% on storage costs.
Worked Examples
Example 1: High-Performance Database
Scenario: A company runs a MySQL database on EC2 that is hitting I/O limits during peak hours. They need maximum throughput. Solution: Create two EBS volumes and configure RAID 0 within the Linux OS. This spreads the I/O load across both volumes, effectively doubling the available throughput and IOPS for the database files.
Example 2: Secure Public Assets
Scenario: An application stores user profile pictures. They want to ensure the pictures are only accessible for 10 minutes after a user logs in. Solution: Use S3 Pre-signed URLs. The application generates a temporary URL that grants time-limited access to the object without making the bucket public.
Checkpoint Questions
- Which storage type is best described as a "flat surface" for data?
- True or False: RAID 1 is the recommended way to increase performance on EBS volumes.
- What S3 feature would you use to move data to a cheaper tier after it hasn't been accessed for 30 days?
- How do S3 and EBS differ in terms of network access (how do you reach the data)?
- Which FSx type is specifically designed for high-performance computing (HPC)?
▶Click to see answers
- Object Storage (Amazon S3).
- False. RAID 0 increases performance (striping); RAID 1 is for mirroring and is discouraged on EBS.
- S3 Lifecycle Policies.
- S3 is accessed via API/HTTP (over the internet or VPC endpoints), while EBS is accessed as a block device attached to a specific EC2 instance.
- FSx for Lustre.