Mastering Database Migration: Strategies for Homogeneous and Heterogeneous Environments
Migrating database schemas and data to different locations and/or different database engines
Mastering Database Migration: Strategies for Homogeneous and Heterogeneous Environments
Database migration is the process of moving data and schemas from a source location to a target environment, often involving changes in the underlying database engine. This guide explores the tools and methodologies for performing these transitions with minimal downtime and high reliability.
Learning Objectives
- Identify the core components and architecture of AWS Database Migration Service (DMS).
- Differentiate between homogeneous and heterogeneous migrations.
- Select appropriate tools (SCT vs. DMS) based on the migration scenario.
- Determine when network limitations necessitate physical transfer methods like Snowball Edge.
- Assess connectivity requirements including VPN and Direct Connect for migration traffic.
Key Terms & Glossary
- AWS DMS (Database Migration Service): A managed service used to migrate databases to AWS quickly and securely while the source database remains operational.
- Replication Instance: An EC2-based engine that performs the heavy lifting of a DMS task, connecting to the source and target.
- Endpoints: The connection information (host, port, credentials) for the source and target databases in a DMS task.
- Homogeneous Migration: Moving data between the same database engines (e.g., MySQL to MySQL).
- Heterogeneous Migration: Moving data between different database engines (e.g., Oracle to Amazon Aurora).
- SCT (Schema Conversion Tool): A standalone tool used to convert source database schemas (tables, views, stored procedures) to a format compatible with a different target engine.
The "Big Idea"
Think of database migration as moving a beating heart. You cannot simply stop the heart (the database) for a long time while the rest of the body (the application) is alive. AWS DMS acts as a "bypass machine" that synchronizes data in real-time, allowing you to switch the "blood flow" to the new database with only a few seconds or minutes of downtime.
Formula / Concept Box
| Migration Feature | Homogeneous | Heterogeneous |
|---|---|---|
| Engine Change | No (Same Engine) | Yes (Different Engine) |
| Primary Tool | AWS DMS | AWS DMS + SCT |
| Schema Change | Minimal/None | Significant Transformation |
| Complexity | Lower | Higher |
| Example | PostgreSQL to Amazon RDS PostgreSQL | Microsoft SQL Server to Amazon Aurora |
Hierarchical Outline
- Phase 1: Discovery and Planning
- AWS Application Discovery Service: Mapping dependencies and server utilization.
- AWS Migration Hub: Centralized tracking of the migration progress.
- Phase 2: Schema Conversion (If Heterogeneous)
- SCT Application: Used to convert code, stored procedures, and complex schemas.
- Manual Refactoring: Addressing incompatible database features.
- Phase 3: Data Transfer (DMS)
- Full Load: Moving the initial bulk of data.
- Change Data Capture (CDC): Replicating ongoing changes from the source to keep the target current.
- Phase 4: Network and Logistics
- Connectivity: Choosing between Site-to-Site VPN (1.25 Gbps) or Direct Connect (1-100 Gbps).
- Large Scale Moves: Utilizing Snowball Edge for multi-terabyte datasets where network bandwidth is insufficient.
Visual Anchors
DMS Migration Workflow
Migration Connectivity Decision
Definition-Example Pairs
- Change Data Capture (CDC): The process of tracking and replicating changes (inserts, updates, deletes) in real-time.
- Example: A customer updates their profile on the legacy database; CDC ensures that change is reflected in the new AWS database immediately before the final cutover.
- Source/Target Endpoints: The specific connection parameters used by DMS to communicate with the databases.
- Example: Setting a source endpoint to an on-premises Oracle SID and a target endpoint to an Amazon Aurora cluster endpoint.
- Lift and Shift (Rehosting): Moving an application/database with zero or minimal changes.
- Example: Moving a SQL Server database running on a local virtual machine directly to SQL Server on Amazon EC2.
Worked Examples
Scenario: Migrating Oracle on-prem to Amazon Aurora PostgreSQL
- Analyze Compatibility: Run the AWS SCT against the Oracle source to see which stored procedures and functions can be automatically converted.
- Schema Conversion: Use SCT to create the equivalent PostgreSQL schema in the Amazon Aurora instance.
- DMS Configuration:
- Provision a Replication Instance in the VPC where Aurora resides.
- Configure the Oracle DB as the Source Endpoint.
- Configure Aurora as the Target Endpoint.
- Task Execution: Start a DMS task for "Full load plus CDC". This migrates the data and stays in sync.
- Cutover: Once the "Latency" metric in DMS hits zero, stop the application, perform a final check, and point the application to the Aurora endpoint.
Checkpoint Questions
- Which tool is strictly required when migrating from a relational Oracle database to a non-relational DynamoDB table?
- Answer: AWS SCT (for schema/structure mapping) and AWS DMS (for data movement).
- If you need to query data stored in S3 files using SQL without actually importing that data into a Redshift cluster, what service should you use?
- Answer: Redshift Spectrum.
- True or False: The AWS Replication Agent must be installed on the source database server for AWS DMS to work.
- Answer: False. DMS only requires network access and endpoints; Replication Agents are for Application Migration Service (server-level migrations).
- What is the recommended solution if you need to migrate a 50TB database but only have a 100Mbps upload speed?
- Answer: AWS Snowball Edge (Physical transfer device).
[!IMPORTANT] Ensure the Replication Instance has enough CPU and Memory to handle the data volume, especially if performing transformations during a heterogeneous migration.
[!TIP] Always verify compliance and encryption requirements (at-rest and in-transit) via AWS Artifact before starting a database migration involving sensitive PII data.