Study Guide890 words

AWS Hybrid Storage Solutions: DataSync, Storage Gateway, and Transfer Family

Hybrid storage options (for example, AWS DataSync, AWS Transfer Family, AWS Storage Gateway)

AWS Hybrid Storage Solutions: DataSync, Storage Gateway, and Transfer Family

This study guide focuses on the critical bridge between on-premises infrastructure and the AWS Cloud. For the AWS Certified Solutions Architect - Associate (SAA-C03) exam, understanding when to use specific hybrid storage and data transfer services is vital for designing high-performing and cost-optimized architectures.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between AWS DataSync, AWS Storage Gateway, and AWS Transfer Family.
  • Identify the appropriate AWS Storage Gateway type (File, Volume, or Tape) based on application requirements.
  • Design data migration strategies using AWS DataSync for large-scale transfers.
  • Select the correct protocol (SFTP, FTPS, FTP) for secure file ingestion using AWS Transfer Family.

Key Terms & Glossary

  • NFS (Network File System): A protocol used primarily by Linux systems to access files over a network.
  • SMB (Server Message Block): A protocol used primarily by Windows systems for file sharing.
  • iSCSI (Internet Small Computer Systems Interface): An IP-based storage networking standard for linking data storage facilities, used by Volume Gateways.
  • VTL (Virtual Tape Library): A data storage virtualization technology used for backup and recovery, emulated by Tape Gateway.
  • POSIX: A set of standard operating system interfaces; EFS and DataSync maintain POSIX-compliant metadata.

The "Big Idea"

Hybrid storage is about extending the data center. Organizations rarely move to the cloud overnight. AWS provides a "spectrum of connectivity" that allows legacy on-premises applications to treat AWS storage (like S3 or EBS) as if it were local hardware, while also providing high-speed pipelines for one-time or recurring data migrations.

Formula / Concept Box

FeatureAWS DataSyncAWS Storage GatewayAWS Transfer Family
Primary GoalFast, automated data migration/syncLow-latency local access to cloud storageSecure file exchange with 3rd parties
InterfaceAgent-based (Software)Virtual or Physical ApplianceManaged Endpoint (DNS)
ProtocolsNFS, SMB, HDFS, S3, FSxNFS, SMB, iSCSI, iSCSI-VTLSFTP, FTPS, FTP, AS2
PersistenceMoving data (Transit)Caching/Mirroring data (Hybrid)Ingesting/Serving data (Gateway)

Hierarchical Outline

  • AWS DataSync (Data Migration & Sync)
    • Architecture: Requires an on-premises agent to be installed.
    • Performance: Optimized for speed; can saturate up to 10 Gbps links.
    • Use Case: One-time migrations, recurring backups, or data processing pipelines.
  • AWS Storage Gateway (Hybrid Access)
    • S3 File Gateway: Provides a local NFS/SMB mount; files mapped 1:1 to S3 objects.
    • FSx File Gateway: Low-latency access to Amazon FSx for Windows File Server.
    • Volume Gateway: Block storage via iSCSI. Available in Cached (most data in S3) or Stored (all data local, backed up to S3) modes.
    • Tape Gateway: Replaces physical tape infrastructure with a Virtual Tape Library (VTL).
  • AWS Transfer Family (Managed File Transfer)
    • Infrastructure: Fully managed, highly available, and auto-scaling.
    • Security: Integrates with IAM and supports custom identity providers (Active Directory).

Visual Anchors

DataSync Architecture

Loading Diagram...

The Hybrid Storage Bridge

\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, rounded corners, minimum width=3cm, minimum height=1cm, align=center}]

% Define styles \tikzstyle{onprem} = [fill=blue!10] \tikzstyle{aws} = [fill=orange!10] \tikzstyle{gate} = [fill=green!10]

% Draw nodes \node (local) [onprem] {On-Premises\Applications}; \node (gateway) [gate, right of=local, xshift=3cm] {AWS Storage\Gateway}; \node (s3) [aws, right of=gateway, xshift=3cm] {AWS Storage$S3/EBS/FSx)};

% Draw paths \draw [->, thick] (local) -- node[above] {NFS / SMB / iSCSI} (gateway); \draw [->, thick] (gateway) -- node[above] {HTTPS / TLS} (s3); \draw [<->, dashed] (local) |- ([yshift=1.5cm]gateway.center) -| (s3) node[pos=0.5, above] {Hybrid Connectivity};

\end{tikzpicture}

Definition-Example Pairs

  • Cached Volume Gateway: Only the most recently accessed data is kept on-premises; the rest is stored in S3.
    • Example: A boutique film studio has 100TB of footage. They keep the 2TB they are currently editing on a local cache for speed, while the rest sits cost-effectively in S3.
  • Stored Volume Gateway: All data is stored locally, but snapshots are taken and stored in AWS as EBS snapshots.
    • Example: A law firm requires immediate, local-speed access to all documents due to strict latency needs but wants a durable off-site backup in AWS for disaster recovery.
  • AWS Transfer Family: A managed service for file transfers using standard protocols.
    • Example: A retail company needs to receive daily inventory reports from 500 different vendors who can only use SFTP. Instead of managing an SFTP server, the company uses AWS Transfer Family to drop those files directly into an S3 bucket.

Worked Examples

Scenario 1: The One-Time Migration

Problem: A company needs to move 50 TB of data from an on-premises Hadoop cluster to Amazon S3 over a weekend. They have a 10 Gbps Direct Connect. Solution: Use AWS DataSync.

  1. Install the DataSync agent on a local VM.
  2. Configure the source (HDFS) and destination (S3).
  3. DataSync will automatically handle the transfer, encryption, and data integrity verification at scale.

Scenario 2: Legacy Tape Replacement

Problem: A hospital is running out of physical space for their magnetic tape backups but must retain records for 10 years for compliance. Solution: Use AWS Storage Gateway (Tape Gateway).

  1. Deploy the Tape Gateway as a virtual appliance.
  2. Connect the existing backup software to the gateway via iSCSI-VTL.
  3. The software "writes" to the virtual tapes, which the gateway automatically uploads to S3 and then archives to S3 Glacier for long-term storage.

Checkpoint Questions

  1. Which service is best suited for a high-speed, one-time migration of millions of small files from an on-premises NFS share to Amazon EFS?
  2. What is the main difference between a Cached Volume and a Stored Volume in Storage Gateway?
  3. True or False: AWS Transfer Family requires you to manage the underlying EC2 instances for the SFTP service.
  4. If an application requires low-latency, block-level access to storage, which gateway should be used?
Click to see answers
  1. AWS DataSync (Optimized for high-speed file transfers and supports EFS as a target).
  2. Cached Volumes store only frequently accessed data locally; Stored Volumes store the entire dataset locally and use AWS for backups.
  3. False. It is a fully managed service.
  4. Volume Gateway. (File Gateway is for file-level access; Volume Gateway provides block-level iSCSI).

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free