Study Guide920 words

Study Guide: Selecting Database Transfer Mechanisms

Selecting the appropriate database transfer mechanism

Selecting the Appropriate Database Transfer Mechanism

This guide covers the strategies and tools used to migrate database workloads to AWS, focusing on selecting the right mechanism based on data volume, connectivity, and downtime requirements.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between homogeneous and heterogeneous database migrations.
  • Evaluate when to use AWS Database Migration Service (DMS) versus the Schema Conversion Tool (SCT).
  • Determine the most cost-effective data transfer service based on data volume (TB vs. PB vs. EB).
  • Identify the appropriate use cases for online versus offline migration tools.

Key Terms & Glossary

  • DMS (Database Migration Service): A service that helps migrate databases to AWS quickly and securely while the source database remains functional.
  • SCT (Schema Conversion Tool): A tool used to convert existing database schemas from one engine to another (e.g., Oracle to PostgreSQL).
  • CDC (Change Data Capture): A process that monitors and captures changes in a source database to keep the target database synchronized.
  • Homogeneous Migration: A migration where the source and target database engines are the same or compatible (e.g., MySQL to Aurora MySQL).
  • Heterogeneous Migration: A migration where the source and target engines differ (e.g., Microsoft SQL Server to Amazon Aurora).

The "Big Idea"

Database migration is not a "one size fits all" task. It requires a balance between Data Volume, Available Bandwidth, and Maximum Allowable Downtime. The goal is to move from a legacy environment to a cloud-native or cloud-hosted environment with minimal disruption to the business logic, often involving a transformation of the data structure itself.

Formula / Concept Box

Migration VariableKey Consideration
Volume < 10 TBPrefer Online tools (DataSync, DMS) or Snowcone for small edge cases.
10 TB < Volume < 10 PBUse AWS Snowball Edge clusters.
Volume > 10 PBUse AWS Snowmobile.
Heterogeneous EngineMust use SCT + DMS.
Homogeneous EngineCan use DMS alone or native engine tools (e.g., mysqldump).

Hierarchical Outline

  • I. Migration Phases
    • Assess: Inventory workloads and identify dependencies.
    • Mobilize: Build the foundation and address gaps.
    • Migrate & Modernize: Execute the move and optimize for the cloud.
  • II. Database Specific Tools
    • AWS DMS: Handles the movement of data; supports one-time and continuous replication.
    • AWS SCT: Essential for engine changes; maps proprietary features to AWS equivalents.
  • III. Data Transfer Categories
    • Online: AWS Transfer Family, S3 Transfer Acceleration, DataSync.
    • Offline: AWS Snow Family (Snowcone, Snowball, Snowmobile).

Visual Anchors

Migration Strategy Decision Tree

Loading Diagram...

Data Volume vs. Transfer Method

\begin{tikzpicture} % Draw axes \draw [->] (0,0) -- (6,0) node[right] {Data Volume}; \draw [->] (0,0) -- (0,4) node[above] {Efficiency};

% Draw regions \draw [dashed] (1.5,0) -- (1.5,3.5) node[above] {10 TB}; \draw [dashed] (4,0) -- (4,3.5) node[above] {10 PB};

% Labels \node at (0.75, 1) {Online}; \node at (2.75, 1) {Snowball}; \node at (5, 1) {Snowmobile};

% Efficiency curve (notional) \draw[thick, blue] (0.2, 3) to[out=-20, in=160] (5.8, 0.5); \node [blue] at (5, 2.5) {Time to Migrate}; \end{tikzpicture}

Definition-Example Pairs

  • One-time Transfer: Moving a static dataset that does not change during the migration process.
    • Example: Moving a 5TB archive of historical financial records to S3 via Snowball.
  • Continuous Streaming: Real-time ingestion of data as it is generated.
    • Example: Using Kinesis Data Firehose to stream website clickstream data into a Redshift database.
  • Offline Transfer: Physical shipment of hardware to move data without using internet bandwidth.
    • Example: An organization with limited 100Mbps upload speed shipping a 100TB database via Snowball Edge.

Worked Examples

Case 1: Migrating an On-Premises Oracle DB to Amazon Aurora PostgreSQL

  1. Analyze: Determine that the engines are different (Heterogeneous).
  2. Schema Conversion: Run AWS SCT to convert the Oracle schema, stored procedures, and triggers into PostgreSQL format.
  3. Data Movement: Set up an AWS DMS replication instance.
  4. Initial Load: Perform a "Full Load" to move existing data.
  5. Synchronization: Enable CDC (Change Data Capture) in DMS to replicate any new transactions while the application is still pointing at Oracle.
  6. Cutover: Once lag is near zero, point the application to Aurora.

Case 2: 50 PB Datacenter Exit

  1. Challenge: The volume is too large for Snowball Edge clusters (which are recommended only up to 10 PB).
  2. Solution: Request an AWS Snowmobile (40-foot shipping container).
  3. Execution: Connect the Snowmobile directly to the local network via high-speed fiber and transfer the data.

Checkpoint Questions

  1. At what data volume does AWS recommend switching from Snowball Edge to Snowmobile?
  2. Which tool must be used before DMS when migrating from SQL Server to Amazon DynamoDB?
  3. True or False: AWS DataSync is primarily used for offline migrations.
  4. What is the main benefit of using CDC during a database migration?
Click to see answers
  1. 10 PB.
  2. AWS Schema Conversion Tool (SCT).
  3. False (DataSync is an online transfer service).
  4. It allows for minimal downtime by keeping the target synchronized with the source until the final cutover.

Muddy Points & Cross-Refs

  • DMS vs. SCT: Students often think DMS converts schemas. It does not. DMS moves data. SCT converts the "skeleton" (schema).
  • Network Bandwidth vs. Offline: Even if you have 1 PB of data, if you have a dedicated 10 Gbps Direct Connect, an online transfer might be faster than shipping a Snowball. Always calculate the math: Time=SizeBandwidthTime = \frac{Size}{Bandwidth}.
  • Cross-Ref: See Chapter 19: Determining New Architectures for picking the target database type (Relational vs. NoSQL).

Comparison Tables

Online vs. Offline Data Transfer

FeatureOnline (DataSync/DMS)Offline (Snow Family)
Primary RequirementHigh Bandwidth / Low VolumeLow Bandwidth / High Volume
ComplexityConfiguration of agents/endpointsPhysical logistics & shipping
DowntimeCan be near-zero with CDCUsually requires a larger cutover window
CostData transfer out fees may applyJob fee + shipping fee

Homogeneous vs. Heterogeneous

MetricHomogeneousHeterogeneous
Target DBSame engine as sourceDifferent engine
Primary ToolAWS DMS or Native ToolsAWS SCT + AWS DMS
DifficultyLow (compatible types)High (requires code refactoring)

Ready to study AWS Certified Solutions Architect - Professional (SAP-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free