AWS Data Transfer and Hybrid Storage Solutions
Data transfer services with appropriate use cases (for example, AWS DataSync, AWS Storage Gateway)
AWS Data Transfer and Hybrid Storage Solutions
This study guide covers the critical services used to move data into AWS and integrate on-premises environments with the cloud, specifically focusing on AWS DataSync, AWS Storage Gateway, and the AWS Snow Family as defined in the SAA-C03 curriculum.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between online and offline data transfer methods.
- Select the appropriate Storage Gateway type (File, Volume, or Tape) based on application requirements.
- Determine when to use AWS DataSync versus AWS Snowball based on bandwidth and data volume.
- Identify cost-effective migration strategies for petabyte-scale data.
Key Terms & Glossary
- AWS DataSync: An online data transfer service that simplifies and accelerates moving data between on-premises storage and AWS (S3, EFS, FSx).
- AWS Storage Gateway: A hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
- NFS (Network File System): A protocol used by Linux systems for file sharing; supported by File Gateway.
- SMB (Server Message Block): A protocol used by Windows systems for file sharing; supported by File Gateway.
- iSCSI (Internet Small Computer Systems Interface): A protocol used for block-level storage; used by Volume and Tape Gateways.
- Snowball Edge: A physical migration and edge computing device with up to 80 TB of usable capacity.
The "Big Idea"
Organizations often face a "gravity" problem: data is heavy and hard to move. AWS provides a spectrum of tools to overcome this, ranging from online synchronization (DataSync) for constant updates, to hybrid bridges (Storage Gateway) for seamless integration, and physical shipping (Snow Family) for massive migrations where the internet is too slow.
Formula / Concept Box
Transfer Time Calculation
To decide between online (DataSync) and offline (Snowball), use the transfer time formula:
| Feature | AWS DataSync | AWS Snowball Edge | AWS Storage Gateway |
|---|---|---|---|
| Method | Online (Internet/Direct Connect) | Physical Shipping (Offline) | Hybrid (Local Cache) |
| Primary Use | Migration / Recurring Sync | Large One-time Migrations | Low-latency Hybrid Access |
| Protocols | NFS, SMB, HDFS, S3 | S3, NFS | NFS, SMB, iSCSI |
| Encryption | TLS (In-transit) | 256-bit (At-rest) | AES-256 (At-rest) |
Hierarchical Outline
- Online Transfer Services
- AWS DataSync: Automated, uses a software agent. Best for ongoing sync or migrations with high bandwidth (up to 10 Gbps).
- AWS Transfer Family: Managed SFTP, FTPS, and FTP access directly to S3 or EFS.
- Hybrid Storage Integration
- Storage Gateway: Connects local apps to AWS storage via a local cache.
- S3 File Gateway: NFS/SMB to S3 objects.
- FSx File Gateway: SMB to Amazon FSx for Windows File Server.
- Volume Gateway: iSCSI block storage (Stored vs. Cached volumes).
- Tape Gateway: iSCSI Virtual Tape Library (VTL) for backups.
- Storage Gateway: Connects local apps to AWS storage via a local cache.
- Offline Migration (Snow Family)
- Snowcone: Small, 8-22 TB, ruggedized, supports edge computing.
- Snowball Edge: Storage Optimized (80 TB) or Compute Optimized.
- Snowmobile: 100 PB shipping container for exabyte-scale moves.
Visual Anchors
Decision Flow: Online vs. Offline
Hybrid Architecture (Storage Gateway)
Definition-Example Pairs
- Cached Volume (Volume Gateway): Stores the entire dataset in S3 and keeps only frequently accessed data locally.
- Example: A company with 100 TB of data but only 2 TB is used daily; they use Cached Volumes to save on-premises hardware costs.
- Stored Volume (Volume Gateway): Stores the entire dataset locally while asynchronously backing it up to S3 as EBS snapshots.
- Example: A low-latency financial app that requires the full dataset locally for millisecond response times but needs offsite disaster recovery.
- DataSync Task: A set of configurations (Source, Destination, Settings) used to execute a sync.
- Example: A nightly task that syncs an on-premises NAS with an Amazon EFS file system for cloud-native analytics.
Worked Examples
Scenario 1: The Petabyte Problem
Problem: A research lab has 500 TB of genomic data. Their internet connection is 100 Mbps. They need to move this to S3 for analysis. How long will it take online, and what is the best solution?
Step-by-Step Breakdown:
- Calculate Online Time:
- 500 TB = 4,000,000 Gigabits.
- 4,000,000 / 100 Mbps = 40,000,000 seconds.
- 40,000,000 / 86,400 (seconds/day) 463 days.
- Evaluate Offline: Shipping a Snowball Edge takes roughly 1 week (shipping + data transfer).
- Conclusion: Use multiple Snowball Edge devices (80 TB each). Online transfer is mathematically unfeasible.
Scenario 2: Replacing Legacy Tape Backups
Problem: A bank uses physical tapes for backup. They want to move to the cloud without changing their existing iSCSI-based backup software.
Solution: Deploy AWS Tape Gateway. It presents a Virtual Tape Library (VTL) to the backup software. Tapes are stored in S3, and when archived, they move to S3 Glacier Deep Archive for lowest cost.
Checkpoint Questions
- Which Storage Gateway type would you use to provide SMB access to files that should be stored as objects in Amazon S3?
- True or False: AWS DataSync requires a physical device to be shipped to your datacenter.
- You need to migrate 10 PB of data. Which Snow Family member is specifically designed for this scale?
- What is the main difference between File Gateway and Volume Gateway?
▶Click to see answers
- S3 File Gateway (or FSx File Gateway if specifically for Windows environments).
- False. DataSync uses a software agent (VM) installed in your local environment.
- AWS Snowmobile (or multiple Snowball Edges, though Snowmobile is the specific 100 PB tool).
- File Gateway provides file-level access (NFS/SMB) to S3 objects; Volume Gateway provides block-level access (iSCSI) for disk volumes.
[!TIP] On the exam, if you see "low-latency local access" and "cloud storage," think Storage Gateway. If you see "migrate large data over slow internet," think Snowball.