Mastering Data Migration to AWS Storage Services
Selecting the appropriate service for data migration to storage services
Mastering Data Migration to AWS Storage Services
Moving data into the AWS Cloud is a critical competency for Solutions Architects. The challenge lies in selecting the tool that balances speed, cost, and operational complexity based on the existing infrastructure and the target AWS storage service (S3, EFS, FSx, or EBS).
Learning Objectives
After studying this guide, you will be able to:
- Differentiate between online and offline data migration methods.
- Select the appropriate service based on data volume and available network bandwidth.
- Identify use cases for hybrid storage solutions like AWS Storage Gateway.
- Recommend migration tools for specific data types (Database, Block, or File).
Key Terms & Glossary
- AWS DataSync: An online data transfer service that simplifies and accelerates moving data between on-premises storage and AWS over the network.
- AWS Snowball Edge: A physical connectivity device used to migrate massive amounts of data (petabytes) offline when network bandwidth is limited.
- AWS Transfer Family: A fully managed service for transferring files over SFTP, FTPS, and FTP directly into Amazon S3 or Amazon EFS.
- AWS Storage Gateway: A hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
- AWS DMS (Database Migration Service): A service that helps you migrate databases to AWS quickly and securely, often used to move relational data into S3 for data lakes.
The "Big Idea"
The core philosophy of data migration is the Bandwidth vs. Volume Trade-off. If your network is fast and the data is small, move it online. If the data is massive and the network is slow, move it physically. The goal is always to minimize "Time to Cloud" while maintaining data integrity.
Formula / Concept Box
Migration Decision Matrix
| Source Data Type | Preferred AWS Target | Migration Service |
|---|---|---|
| On-prem NFS/SMB | S3, EFS, or FSx | AWS DataSync |
| FTP/SFTP Traffic | S3 or EFS | AWS Transfer Family |
| On-prem Databases | RDS, Redshift, or S3 | AWS DMS |
| Physical Servers | EC2 (EBS) | Application Migration Service (MGN) |
| Massive Archives | S3 / S3 Glacier | Snowball Edge / Snowmobile |
Hierarchical Outline
- Online Migration Services (Network-based)
- AWS DataSync: High-speed, automated. Best for recurring syncs or large migrations with good bandwidth.
- AWS Transfer Family: Legacy protocol support (SFTP/FTP) for seamless user transitions.
- AWS DMS: Focused on data structures; can migrate to S3 for "Data Lake" architectures.
- Offline Migration Services (Physical-based)
- Snowcone: Small, portable (8TB). Good for edge computing and small migrations.
- Snowball Edge: Large scale (80TB+). Includes compute capabilities for pre-processing data.
- Snowmobile: Exabyte-scale (up to 100PB per truck). For massive data center evacuations.
- Hybrid Storage Solutions (Persistent Link)
- File Gateway: Local cache for S3 (NFS/SMB).
- Volume Gateway: Block storage (iSCSI) backed by S3.
- Tape Gateway: Replaces physical tape libraries with S3 Glacier.
Visual Anchors
Migration Selection Logic
Transfer Time vs. Data Size Concept
\begin{tikzpicture}[scale=0.8] \draw[->] (0,0) -- (6,0) node[right] {Data Volume}; \draw[->] (0,0) -- (0,5) node[above] {Time to Complete}; \draw[blue, thick] (0,0.5) -- (5,4.5) node[right] {Online (DataSync)}; \draw[red, thick] (0,3) -- (5,3.2) node[right] {Offline (Snowball)}; \node at (2.5, 2) [align=center, font=\scriptsize] {Network\Bottleneck}; \draw[dashed] (3.1,0) -- (3.1,3); \node at (3.1,-0.4) [font=\scriptsize] {Break-even Point}; \end{tikzpicture}
Definition-Example Pairs
- Service: AWS DataSync
- Definition: An online transfer service that automates moving data between on-premises storage and AWS storage services.
- Example: A media company needs to move 50TB of video files from an on-premises NAS to Amazon EFS. They use DataSync to schedule nightly transfers to keep the cloud copy updated.
- Service: AWS Snowball Edge Storage Optimized
- Definition: A rugged device with 80TB of usable capacity used for one-time high-volume data transfers.
- Example: A research facility in a remote location with a 10Mbps internet connection needs to move 100TB of climate data to S3. They request two Snowball devices, load the data locally, and ship them back to AWS.
Worked Examples
Scenario 1: The Legacy Server Move
Problem: A company wants to migrate 10 physical Windows servers to AWS EC2 without reconfiguring the OS or applications. Solution: Use AWS Application Migration Service (MGN).
- Install the replication agent on the source servers.
- MGN replicates data at the block level into an EBS staging area.
- Perform a "cutover" to launch the EC2 instances.
Scenario 2: The Hybrid Backup
Problem: A hospital needs to keep its medical images on-premises for fast local retrieval but wants to back them up to Amazon S3 for long-term durability and cost-savings. Solution: Implement AWS Storage Gateway (File Gateway).
- Deploy the File Gateway VM in the local data center.
- Map the gateway as an NFS/SMB share to local applications.
- Frequently accessed data stays in the local cache; all data is asynchronously backed up to S3.
Checkpoint Questions
- Which service would you choose to migrate a 200TB database from on-premises to AWS when you only have a 50Mbps connection?
- Answer: AWS Snowball Edge (to move the initial data) and potentially AWS DMS for ongoing changes once the bulk is moved.
- What is the main difference between AWS DataSync and AWS Storage Gateway?
- Answer: DataSync is for migrating/syncing existing data (one-time or periodic), while Storage Gateway is a hybrid solution for ongoing local access to cloud storage.
- If you need to provide SFTP access to external vendors to upload files directly into an S3 bucket, which service is best?
- Answer: AWS Transfer Family.
- True or False: AWS DMS can migrate data from an Oracle database directly into an Amazon S3 bucket.
- Answer: True. S3 is a supported target for DMS to create data lakes.
[!TIP] In the SAA-C03 exam, if the question mentions "limited bandwidth" and "large data volumes," the answer is almost always a Snowball family service.