AWS Storage Services & Hybrid Integration Study Guide
AWS storage services (for example, Amazon EBS, Amazon EFS, Amazon FSx, Amazon S3, AWS Storage Gateway Volume Gateway)
AWS Storage Services & Hybrid Integration Study Guide
This guide covers the core AWS storage portfolio (S3, EBS, EFS, FSx) and the hybrid connectivity solutions provided by AWS Storage Gateway and AWS DataSync, as required for the Solutions Architect - Professional (SAP-C02) curriculum.
Learning Objectives
By the end of this module, you should be able to:
- Differentiate between block, file, and object storage services in AWS.
- Select the appropriate storage service based on performance, protocol (NFS, SMB, iSCSI), and access patterns.
- Evaluate the best flavor of AWS Storage Gateway for hybrid cloud architectures.
- Design data migration and backup strategies using AWS DataSync and Tape Gateway.
Key Terms & Glossary
- iSCSI (Internet Small Computer System Interface): An IP-based storage networking standard for linking data storage facilities, used primarily by Volume Gateway.
- NFS (Network File System): A distributed file system protocol allowing a user on a client computer to access files over a network, used by EFS and S3 File Gateway.
- SMB (Server Message Block): A network file sharing protocol used for providing shared access to files and printers, primarily used by Windows environments and FSx.
- POSIX (Portable Operating System Interface): A family of standards specified by the IEEE for maintaining compatibility between operating systems; relevant for EFS and S3 metadata.
- VTL (Virtual Tape Library): A data storage virtualization technology used for backup and recovery, emulated by AWS Tape Gateway.
The "Big Idea"
AWS storage is not a "one size fits all" solution. The architecture shifts from High Performance/Low Latency (EBS) for individual instances, to Shared Scalable Filesystems (EFS/FSx) for distributed workloads, to Infinite Object Storage (S3) for the data lake. The Storage Gateway acts as the "bridge," allowing on-premises legacy environments to treat cloud-scale storage as if it were local hardware.
Formula / Concept Box
| Storage Type | AWS Service | Protocol | Best For... |
|---|---|---|---|
| Block | Amazon EBS | Proprietary (EC2) | Databases, Boot Volumes, Low-latency ERP |
| File | Amazon EFS | NFS v4 | Linux Web Farms, Content Management |
| File | Amazon FSx | SMB, Lustre, ZFS | Windows Apps, HPC, High-perf SQL |
| Object | Amazon S3 | REST API (HTTP) | Data Lakes, Static Assets, Backups |
| Hybrid | Storage Gateway | iSCSI, NFS, SMB | Migration, Cloud-bursting, Hybrid Backup |
Hierarchical Outline
- Object Storage: Amazon S3
- Architecture: Key-value store with infinite scaling.
- Usage: Primary target for DataSync and all Storage Gateway flavors.
- File Storage: EFS vs. FSx
- EFS: Regional, serverless, Linux-native (NFS).
- FSx for Windows: Fully managed Windows File Server (SMB).
- FSx for Lustre: High-performance computing (HPC) with S3 integration.
- Hybrid Connectivity: AWS Storage Gateway
- S3 File Gateway: NFS/SMB interface to S3 objects. One-to-one file-to-object mapping.
- FSx File Gateway: Low-latency on-prem access to FSx for Windows.
- Volume Gateway (Stored): Entire dataset local, backed up to S3.
- Volume Gateway (Cached): Frequently accessed data local, primary data in S3.
- Tape Gateway: Replaces physical tape backups with S3/Glacier VTL.
- Data Migration: AWS DataSync
- Function: Automated, accelerated online data transfer.
- Scope: Migrates from on-premises (NFS/SMB/HDFS) to S3, EFS, or FSx.
Visual Anchors
Storage Selection Logic
Hybrid S3 File Gateway Architecture
\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, rounded corners, align=center, fill=blue!10}] % On-Premises Side \node (app) {On-Prem Application$NFS/SMB)}; \node (gw) [right of=app, xshift=2cm, fill=green!10] {Storage Gateway$Software/Hardware)}; \node (cache) [below of=gw, yshift=0.5cm, fill=orange!10] {Local Cache$SSD)};
% Connection
\draw[->, thick] (app) -- (gw);
\draw[<->, dashed] (gw) -- (cache);
% Cloud Side
\node (s3) [right of=gw, xshift=3cm, fill=yellow!10] {Amazon S3\$Object Store)};
\node (net) [above of=s3, yshift=-1cm, draw=none, fill=none] {\textit{VPN / Direct Connect}};
\draw[->, ultra thick] (gw) -- node[midway, above] {SSL/TLS} (s3);\end{tikzpicture}
Definition-Example Pairs
- Cached Volume: A configuration where the primary data is in S3, and only a subset is kept locally.
- Example: A company has 100TB of historical data but only needs 1TB for daily operations; they use Cached Volumes to save on-prem hardware costs.
- Stored Volume: A configuration where the primary data is on-premises, and it is asynchronously backed up as EBS snapshots to S3.
- Example: A low-latency mission-critical app requires the full speed of local DAS but needs a disaster recovery copy in AWS.
- DataSync Task: A scheduled job that synchronizes data between two locations.
- Example: Running a nightly task to move logs from an on-premise NFS server to an S3 bucket for Athena analysis.
Worked Examples
Scenario: Migrating Windows User Shares
Problem: A law firm wants to migrate 50TB of Windows Home Directories to AWS but keep the users' experience unchanged (they must still see a local Z:\ drive).
Solution Breakdown:
- Service Selection: Use Amazon FSx for Windows File Server to host the data in AWS.
- Hybrid Access: Deploy an Amazon FSx File Gateway on-premises.
- Configuration: Connect the Gateway to the FSx file system over a Site-to-Site VPN.
- Result: Users access the gateway IP on-premises via SMB; frequently used files are cached locally for "local speed," while the master copy resides in FSx.
Checkpoint Questions
- Which Storage Gateway type allows you to see your on-premises files as individual objects in an S3 bucket?
- You need to migrate data from a legacy Hadoop (HDFS) cluster to S3 for a one-time project. Which service is most appropriate?
- What is the primary difference between S3 File Gateway and FSx File Gateway regarding protocols supported?
▶Click to see answers
- S3 File Gateway.
- AWS DataSync.
- S3 File Gateway supports both NFS and SMB; FSx File Gateway supports SMB only.
Muddy Points & Cross-Refs
- S3 File Gateway vs. DataSync: This is a common exam trap. Use DataSync for one-time migrations or periodic "syncing" (moving data). Use Storage Gateway for continuous, transparent hybrid access (living on the data).
- Volume Gateway Snapshots: Remember that Volume Gateway backups are stored as Amazon EBS Snapshots, which can be used to create EBS volumes for EC2 instances in a DR scenario.
Comparison Tables
Storage Gateway Flavors
| Feature | S3 File Gateway | FSx File Gateway | Volume Gateway (Cached) | Tape Gateway |
|---|---|---|---|---|
| Interface | NFS / SMB | SMB | iSCSI | iSCSI-VTL |
| AWS Backend | Amazon S3 | FSx for Windows | Amazon S3 | S3 / S3 Glacier |
| Mapping | 1 File = 1 Object | 1 File = 1 FSx File | Block Storage (LUN) | Virtual Tape |
| Primary Use | Analytics on S3 | Windows Home Dirs | Cloud-backed Disks | Tape Replacement |
[!IMPORTANT] For the SAP-C02 exam, always check if the question mentions "Low Latency" or "Legacy Protocol." If they need to access S3 but the app only speaks NFS, the answer is almost always S3 File Gateway.