Study Guide1,150 words

AWS Storage Services & Hybrid Integration Study Guide

AWS storage services (for example, Amazon EBS, Amazon EFS, Amazon FSx, Amazon S3, AWS Storage Gateway Volume Gateway)

AWS Storage Services & Hybrid Integration Study Guide

This guide covers the core AWS storage portfolio (S3, EBS, EFS, FSx) and the hybrid connectivity solutions provided by AWS Storage Gateway and AWS DataSync, as required for the Solutions Architect - Professional (SAP-C02) curriculum.

Learning Objectives

By the end of this module, you should be able to:

  • Differentiate between block, file, and object storage services in AWS.
  • Select the appropriate storage service based on performance, protocol (NFS, SMB, iSCSI), and access patterns.
  • Evaluate the best flavor of AWS Storage Gateway for hybrid cloud architectures.
  • Design data migration and backup strategies using AWS DataSync and Tape Gateway.

Key Terms & Glossary

  • iSCSI (Internet Small Computer System Interface): An IP-based storage networking standard for linking data storage facilities, used primarily by Volume Gateway.
  • NFS (Network File System): A distributed file system protocol allowing a user on a client computer to access files over a network, used by EFS and S3 File Gateway.
  • SMB (Server Message Block): A network file sharing protocol used for providing shared access to files and printers, primarily used by Windows environments and FSx.
  • POSIX (Portable Operating System Interface): A family of standards specified by the IEEE for maintaining compatibility between operating systems; relevant for EFS and S3 metadata.
  • VTL (Virtual Tape Library): A data storage virtualization technology used for backup and recovery, emulated by AWS Tape Gateway.

The "Big Idea"

AWS storage is not a "one size fits all" solution. The architecture shifts from High Performance/Low Latency (EBS) for individual instances, to Shared Scalable Filesystems (EFS/FSx) for distributed workloads, to Infinite Object Storage (S3) for the data lake. The Storage Gateway acts as the "bridge," allowing on-premises legacy environments to treat cloud-scale storage as if it were local hardware.

Formula / Concept Box

Storage TypeAWS ServiceProtocolBest For...
BlockAmazon EBSProprietary (EC2)Databases, Boot Volumes, Low-latency ERP
FileAmazon EFSNFS v4Linux Web Farms, Content Management
FileAmazon FSxSMB, Lustre, ZFSWindows Apps, HPC, High-perf SQL
ObjectAmazon S3REST API (HTTP)Data Lakes, Static Assets, Backups
HybridStorage GatewayiSCSI, NFS, SMBMigration, Cloud-bursting, Hybrid Backup

Hierarchical Outline

  1. Object Storage: Amazon S3
    • Architecture: Key-value store with infinite scaling.
    • Usage: Primary target for DataSync and all Storage Gateway flavors.
  2. File Storage: EFS vs. FSx
    • EFS: Regional, serverless, Linux-native (NFS).
    • FSx for Windows: Fully managed Windows File Server (SMB).
    • FSx for Lustre: High-performance computing (HPC) with S3 integration.
  3. Hybrid Connectivity: AWS Storage Gateway
    • S3 File Gateway: NFS/SMB interface to S3 objects. One-to-one file-to-object mapping.
    • FSx File Gateway: Low-latency on-prem access to FSx for Windows.
    • Volume Gateway (Stored): Entire dataset local, backed up to S3.
    • Volume Gateway (Cached): Frequently accessed data local, primary data in S3.
    • Tape Gateway: Replaces physical tape backups with S3/Glacier VTL.
  4. Data Migration: AWS DataSync
    • Function: Automated, accelerated online data transfer.
    • Scope: Migrates from on-premises (NFS/SMB/HDFS) to S3, EFS, or FSx.

Visual Anchors

Storage Selection Logic

Loading Diagram...

Hybrid S3 File Gateway Architecture

\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, rounded corners, align=center, fill=blue!10}] % On-Premises Side \node (app) {On-Prem Application$NFS/SMB)}; \node (gw) [right of=app, xshift=2cm, fill=green!10] {Storage Gateway$Software/Hardware)}; \node (cache) [below of=gw, yshift=0.5cm, fill=orange!10] {Local Cache$SSD)};

code
% Connection \draw[->, thick] (app) -- (gw); \draw[<->, dashed] (gw) -- (cache); % Cloud Side \node (s3) [right of=gw, xshift=3cm, fill=yellow!10] {Amazon S3\$Object Store)}; \node (net) [above of=s3, yshift=-1cm, draw=none, fill=none] {\textit{VPN / Direct Connect}}; \draw[->, ultra thick] (gw) -- node[midway, above] {SSL/TLS} (s3);

\end{tikzpicture}

Definition-Example Pairs

  • Cached Volume: A configuration where the primary data is in S3, and only a subset is kept locally.
    • Example: A company has 100TB of historical data but only needs 1TB for daily operations; they use Cached Volumes to save on-prem hardware costs.
  • Stored Volume: A configuration where the primary data is on-premises, and it is asynchronously backed up as EBS snapshots to S3.
    • Example: A low-latency mission-critical app requires the full speed of local DAS but needs a disaster recovery copy in AWS.
  • DataSync Task: A scheduled job that synchronizes data between two locations.
    • Example: Running a nightly task to move logs from an on-premise NFS server to an S3 bucket for Athena analysis.

Worked Examples

Scenario: Migrating Windows User Shares

Problem: A law firm wants to migrate 50TB of Windows Home Directories to AWS but keep the users' experience unchanged (they must still see a local Z:\ drive).

Solution Breakdown:

  1. Service Selection: Use Amazon FSx for Windows File Server to host the data in AWS.
  2. Hybrid Access: Deploy an Amazon FSx File Gateway on-premises.
  3. Configuration: Connect the Gateway to the FSx file system over a Site-to-Site VPN.
  4. Result: Users access the gateway IP on-premises via SMB; frequently used files are cached locally for "local speed," while the master copy resides in FSx.

Checkpoint Questions

  1. Which Storage Gateway type allows you to see your on-premises files as individual objects in an S3 bucket?
  2. You need to migrate data from a legacy Hadoop (HDFS) cluster to S3 for a one-time project. Which service is most appropriate?
  3. What is the primary difference between S3 File Gateway and FSx File Gateway regarding protocols supported?
Click to see answers
  1. S3 File Gateway.
  2. AWS DataSync.
  3. S3 File Gateway supports both NFS and SMB; FSx File Gateway supports SMB only.

Muddy Points & Cross-Refs

  • S3 File Gateway vs. DataSync: This is a common exam trap. Use DataSync for one-time migrations or periodic "syncing" (moving data). Use Storage Gateway for continuous, transparent hybrid access (living on the data).
  • Volume Gateway Snapshots: Remember that Volume Gateway backups are stored as Amazon EBS Snapshots, which can be used to create EBS volumes for EC2 instances in a DR scenario.

Comparison Tables

Storage Gateway Flavors

FeatureS3 File GatewayFSx File GatewayVolume Gateway (Cached)Tape Gateway
InterfaceNFS / SMBSMBiSCSIiSCSI-VTL
AWS BackendAmazon S3FSx for WindowsAmazon S3S3 / S3 Glacier
Mapping1 File = 1 Object1 File = 1 FSx FileBlock Storage (LUN)Virtual Tape
Primary UseAnalytics on S3Windows Home DirsCloud-backed DisksTape Replacement

[!IMPORTANT] For the SAP-C02 exam, always check if the question mentions "Low Latency" or "Legacy Protocol." If they need to access S3 but the app only speaks NFS, the answer is almost always S3 File Gateway.

Ready to study AWS Certified Solutions Architect - Professional (SAP-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free