AWS Data Migration: Online and Offline Strategies
Data migration options and tools (for example, AWS DataSync, AWS Transfer Family, AWS Snow Family, Amazon S3 Transfer Acceleration)
AWS Data Migration: Online and Offline Strategies
This guide explores the mechanisms provided by AWS to migrate data efficiently, securely, and cost-effectively, covering both online network-based tools and offline physical transport devices.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between online and offline data migration tools.
- Select the appropriate AWS service based on data volume and connectivity constraints.
- Explain how AWS S3 Transfer Acceleration optimizes global data uploads.
- Identify the security methods (encryption) used by different migration tools.
- Choose between Snowball Edge and Snowmobile for large-scale data movements.
Key Terms & Glossary
- Edge Location: A site that CloudFront uses to cache copies of your content for faster delivery to users at any location. Used by S3 Transfer Acceleration.
- Exabyte-scale: Data volumes equivalent to 1,000 Petabytes (PB). Handled by AWS Snowmobile.
- Encryption at Rest: Protection of data while it is stored on a disk or device. AWS Snow devices use 256-bit keys for this.
- TLS (Transport Layer Security): A cryptographic protocol designed to provide communications security over a computer network.
- ETL (Extract, Transform, Load): A three-phase process where data is extracted, transformed into a proper format, and loaded into a final target.
The "Big Idea"
Data migration is not a "one size fits all" task. It requires a balance between Available Bandwidth, Data Volume, and Time. When your internet connection is too slow to move petabytes of data in a reasonable timeframe, you shift from the "Online" lane to the "Offline" lane (Snow Family). If your data is distributed globally, you leverage the AWS global network (Transfer Acceleration) to bypass the congested public internet.
Formula / Concept Box
| Migration Variable | Decision Driver |
|---|---|
| Data Volume < 10 TB | Online tools (DataSync, S3 TA) are usually more cost-effective. |
| Data Volume > 10 TB | Start considering the AWS Snow Family. |
| Frequent / Ongoing | Use AWS DataSync or AWS Storage Gateway. |
| Streaming / Real-time | Use Amazon Kinesis Data Firehose. |
| Legacy Protocols | Use AWS Transfer Family (SFTP, FTPS, FTP, AS2). |
Hierarchical Outline
- Online Migration Tools
- AWS DataSync: High-speed, automated data transfer for ongoing or one-time migrations between on-premises and AWS.
- AWS Transfer Family: Managed service for SFTP/FTPS/FTP/AS2 protocols directly into S3 or EFS.
- S3 Transfer Acceleration: Optimizes uploads via CloudFront Edge Locations for geographically dispersed users.
- Kinesis Data Firehose: Managed streaming delivery for IoT, logs, and social media data.
- Offline Migration Tools (Snow Family)
- AWS Snowcone: Small, portable (8 TB usable); for edge computing and small migrations.
- AWS Snowball Edge: Petabyte-scale (up to 80 TB usable); includes compute capabilities (S3 and EC2 compatibility).
- AWS Snowmobile: Exabyte-scale (up to 100 PB per container); 40-foot shipping container for massive data center moves.
Visual Anchors
Migration Decision Flow
S3 Transfer Acceleration vs. Standard Internet
\begin{tikzpicture}[node distance=2cm, auto] \draw[thick,->] (0,3) -- (8,3) node[midway, above] {\textbf{Standard Internet (High Latency/Jitter)}}; \draw[thick,blue,->] (0,1) -- (2,1) node[midway, below] {\textbf{User}}; \draw[fill=blue!20] (2,0.5) rectangle (4,1.5) node[midway] {\textbf{Edge Location}}; \draw[thick,blue,->] (4,1) -- (8,1) node[midway, below] {\textbf{AWS Backbone (Fast/Private)}}; \draw[fill=green!20] (8,0.5) rectangle (10,3.5) node[midway, rotate=90] {\textbf{S3 Bucket}}; \end{tikzpicture}
Definition-Example Pairs
- AWS DataSync: A service that automates and accelerates moving data between on-premises storage and AWS over the network.
- Example: A media company synchronizing their local NAS with Amazon S3 every night for backup.
- AWS Snowball Edge: A physical ruggedized device used to transport large amounts of data to AWS.
- Example: A research station in Antarctica with no internet connectivity shipping 50 TB of climate data back to the US.
- S3 Transfer Acceleration: A bucket-level feature that enables fast data transfers over long distances.
- Example: A mobile app in Tokyo uploading high-res photos to an S3 bucket located in Northern Virginia.
Worked Examples
Scenario 1: The Exabyte Migration
Problem: A global financial institution needs to move 150 PB of historical transaction records from an on-premises data center to AWS S3 Glacier. Their outbound internet capacity is 10 Gbps. Solution:
- Calculation: At 10 Gbps (theoretical max), 150 PB would take roughly 3-4 years to upload online.
- Tool Selection: AWS Snowmobile. Two Snowmobile containers (100 PB each) would be deployed.
- Process: Plug the Snowmobile into the local network, transfer data, and ship it back to an AWS Region for secure upload.
Scenario 2: Geographically Dispersed Uploads
Problem: A video production firm has editors in London, Mumbai, and Sydney. All files must be stored in a central S3 bucket in the us-east-1 region.
Solution:
- Tool Selection: Amazon S3 Transfer Acceleration.
- Optimization: Editors upload to the nearest CloudFront Edge location. The data travels over the optimized AWS internal network to the US East region, bypassing internet congestion points.
Checkpoint Questions
- At what data volume does AWS suggest Snowball Edge becomes more cost-effective than online methods?
- Which service is best for continuous ETL of streaming IoT data into S3?
- True/False: AWS DataSync stores and encrypts data at rest within its own service.
- How does S3 Transfer Acceleration achieve higher speeds for distant users?
Muddy Points & Cross-Refs
- Snowball vs. Snowmobile: Remember that Snowmobile is for Exabyte scale (multi-PB). If you have less than 10 PB, AWS recommends a cluster of Snowball Edge devices instead of one Snowmobile for better cost-efficiency.
- DataSync vs. Transfer Family: DataSync is for high-speed data moving (syncing filesystems). Transfer Family is a protocol interface (SFTP/FTP) for users who need to interact with S3/EFS using legacy tools.
- Security: All Snow devices use 256-bit encryption. Online tools like DataSync and S3 TA use TLS/SSL for in-transit protection.
Comparison Tables
| Feature | AWS DataSync | AWS Snowball Edge | S3 Transfer Acceleration |
|---|---|---|---|
| Transfer Type | Online (Network) | Offline (Physical) | Online (Network) |
| Primary Use Case | Recurring synchronization | One-time massive migration | Fast global uploads to S3 |
| Connectivity | Requires stable bandwidth | No internet required | Public Internet + AWS Backbone |
| Scale | GB to PB (Continuous) | PB (Discrete chunks) | GB to TB (Global) |
| Security | TLS in transit | KMS (256-bit) at rest | SSL/TLS in transit |