Study Guide: Selecting Database Transfer Mechanisms
Selecting the appropriate database transfer mechanism
Selecting the Appropriate Database Transfer Mechanism
This guide covers the strategies and tools used to migrate database workloads to AWS, focusing on selecting the right mechanism based on data volume, connectivity, and downtime requirements.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between homogeneous and heterogeneous database migrations.
- Evaluate when to use AWS Database Migration Service (DMS) versus the Schema Conversion Tool (SCT).
- Determine the most cost-effective data transfer service based on data volume (TB vs. PB vs. EB).
- Identify the appropriate use cases for online versus offline migration tools.
Key Terms & Glossary
- DMS (Database Migration Service): A service that helps migrate databases to AWS quickly and securely while the source database remains functional.
- SCT (Schema Conversion Tool): A tool used to convert existing database schemas from one engine to another (e.g., Oracle to PostgreSQL).
- CDC (Change Data Capture): A process that monitors and captures changes in a source database to keep the target database synchronized.
- Homogeneous Migration: A migration where the source and target database engines are the same or compatible (e.g., MySQL to Aurora MySQL).
- Heterogeneous Migration: A migration where the source and target engines differ (e.g., Microsoft SQL Server to Amazon Aurora).
The "Big Idea"
Database migration is not a "one size fits all" task. It requires a balance between Data Volume, Available Bandwidth, and Maximum Allowable Downtime. The goal is to move from a legacy environment to a cloud-native or cloud-hosted environment with minimal disruption to the business logic, often involving a transformation of the data structure itself.
Formula / Concept Box
| Migration Variable | Key Consideration |
|---|---|
| Volume < 10 TB | Prefer Online tools (DataSync, DMS) or Snowcone for small edge cases. |
| 10 TB < Volume < 10 PB | Use AWS Snowball Edge clusters. |
| Volume > 10 PB | Use AWS Snowmobile. |
| Heterogeneous Engine | Must use SCT + DMS. |
| Homogeneous Engine | Can use DMS alone or native engine tools (e.g., mysqldump). |
Hierarchical Outline
- I. Migration Phases
- Assess: Inventory workloads and identify dependencies.
- Mobilize: Build the foundation and address gaps.
- Migrate & Modernize: Execute the move and optimize for the cloud.
- II. Database Specific Tools
- AWS DMS: Handles the movement of data; supports one-time and continuous replication.
- AWS SCT: Essential for engine changes; maps proprietary features to AWS equivalents.
- III. Data Transfer Categories
- Online: AWS Transfer Family, S3 Transfer Acceleration, DataSync.
- Offline: AWS Snow Family (Snowcone, Snowball, Snowmobile).
Visual Anchors
Migration Strategy Decision Tree
Data Volume vs. Transfer Method
\begin{tikzpicture} % Draw axes \draw [->] (0,0) -- (6,0) node[right] {Data Volume}; \draw [->] (0,0) -- (0,4) node[above] {Efficiency};
% Draw regions \draw [dashed] (1.5,0) -- (1.5,3.5) node[above] {10 TB}; \draw [dashed] (4,0) -- (4,3.5) node[above] {10 PB};
% Labels \node at (0.75, 1) {Online}; \node at (2.75, 1) {Snowball}; \node at (5, 1) {Snowmobile};
% Efficiency curve (notional) \draw[thick, blue] (0.2, 3) to[out=-20, in=160] (5.8, 0.5); \node [blue] at (5, 2.5) {Time to Migrate}; \end{tikzpicture}
Definition-Example Pairs
- One-time Transfer: Moving a static dataset that does not change during the migration process.
- Example: Moving a 5TB archive of historical financial records to S3 via Snowball.
- Continuous Streaming: Real-time ingestion of data as it is generated.
- Example: Using Kinesis Data Firehose to stream website clickstream data into a Redshift database.
- Offline Transfer: Physical shipment of hardware to move data without using internet bandwidth.
- Example: An organization with limited 100Mbps upload speed shipping a 100TB database via Snowball Edge.
Worked Examples
Case 1: Migrating an On-Premises Oracle DB to Amazon Aurora PostgreSQL
- Analyze: Determine that the engines are different (Heterogeneous).
- Schema Conversion: Run AWS SCT to convert the Oracle schema, stored procedures, and triggers into PostgreSQL format.
- Data Movement: Set up an AWS DMS replication instance.
- Initial Load: Perform a "Full Load" to move existing data.
- Synchronization: Enable CDC (Change Data Capture) in DMS to replicate any new transactions while the application is still pointing at Oracle.
- Cutover: Once lag is near zero, point the application to Aurora.
Case 2: 50 PB Datacenter Exit
- Challenge: The volume is too large for Snowball Edge clusters (which are recommended only up to 10 PB).
- Solution: Request an AWS Snowmobile (40-foot shipping container).
- Execution: Connect the Snowmobile directly to the local network via high-speed fiber and transfer the data.
Checkpoint Questions
- At what data volume does AWS recommend switching from Snowball Edge to Snowmobile?
- Which tool must be used before DMS when migrating from SQL Server to Amazon DynamoDB?
- True or False: AWS DataSync is primarily used for offline migrations.
- What is the main benefit of using CDC during a database migration?
▶Click to see answers
- 10 PB.
- AWS Schema Conversion Tool (SCT).
- False (DataSync is an online transfer service).
- It allows for minimal downtime by keeping the target synchronized with the source until the final cutover.
Muddy Points & Cross-Refs
- DMS vs. SCT: Students often think DMS converts schemas. It does not. DMS moves data. SCT converts the "skeleton" (schema).
- Network Bandwidth vs. Offline: Even if you have 1 PB of data, if you have a dedicated 10 Gbps Direct Connect, an online transfer might be faster than shipping a Snowball. Always calculate the math: .
- Cross-Ref: See Chapter 19: Determining New Architectures for picking the target database type (Relational vs. NoSQL).
Comparison Tables
Online vs. Offline Data Transfer
| Feature | Online (DataSync/DMS) | Offline (Snow Family) |
|---|---|---|
| Primary Requirement | High Bandwidth / Low Volume | Low Bandwidth / High Volume |
| Complexity | Configuration of agents/endpoints | Physical logistics & shipping |
| Downtime | Can be near-zero with CDC | Usually requires a larger cutover window |
| Cost | Data transfer out fees may apply | Job fee + shipping fee |
Homogeneous vs. Heterogeneous
| Metric | Homogeneous | Heterogeneous |
|---|---|---|
| Target DB | Same engine as source | Different engine |
| Primary Tool | AWS DMS or Native Tools | AWS SCT + AWS DMS |
| Difficulty | Low (compatible types) | High (requires code refactoring) |