Study Guide1,050 words

Study Guide: Integrating Migration Tools into Data Processing Systems

Integrate migration tools into data processing systems (for example, AWS Transfer Family)

Integrating Migration Tools into Data Processing Systems

This guide explores the mechanisms for moving data from on-premises environments or other clouds into the AWS ecosystem, with a specific focus on AWS Transfer Family and related services like AWS DataSync and AWS DMS.

Learning Objectives

By the end of this module, you will be able to:

  • Distinguish between the various AWS migration services based on data type (file vs. database) and transfer method (online vs. offline).
  • Configure AWS Transfer Family to provide secure SFTP/FTP access to Amazon S3 and EFS.
  • Integrate AWS DataSync for high-performance, automated file transfers.
  • Determine the appropriate use cases for the AWS Snow Family in petabyte-scale migrations.
  • Align migration tools with downstream processing services like AWS Glue and Amazon Redshift.

Key Terms & Glossary

  • AWS Transfer Family: A fully managed service for transferring files over SFTP, AS2, FTPS, and FTP directly into Amazon S3 or Amazon EFS.
  • AWS DataSync: An online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage and AWS over the network.
  • AWS DMS (Database Migration Service): A service that helps migrate databases to AWS quickly and securely, supporting both one-time migrations and continuous replication.
  • SCT (Schema Conversion Tool): A tool used with DMS to convert the source database schema to a format compatible with the target AWS database.
  • Snowball Edge: A ruggedized device with on-board storage and compute power used for moving large volumes of data (terabytes to petabytes) offline.

The "Big Idea"

Data migration is the "front door" of the data engineering pipeline. The goal is to eliminate "undifferentiated heavy lifting"—the manual task of managing servers, patching SFTP software, or writing custom scripts to handle network retries. By integrating managed migration tools, data engineers ensure that data arrives in the cloud securely, intact, and ready for immediate processing by services like AWS Glue or Amazon Redshift.

Formula / Concept Box

FeatureDecision Criteria
ConnectivityUse Transfer Family for legacy protocol support (SFTP/FTP); use DataSync for high-speed agent-based transfer.
Data VolumeUse Snow Family if the data volume exceeds network capacity (typically > 100TB and taking > 1 week via wire).
Data TypeUse DMS for structured database records; use Transfer Family/DataSync for files and objects.
VelocityUse DMS Change Data Capture (CDC) for real-time continuous replication.

Hierarchical Outline

  • I. File-Based Migration (Unstructured/Semi-structured)
    • AWS Transfer Family
      • Supported Protocols: SFTP, FTPS, FTP, AS2.
      • Storage Targets: Amazon S3 (Object) and Amazon EFS (File).
      • Use Case: Migrating existing workflows that rely on SFTP without changing client-side code.
    • AWS DataSync
      • Performance: Up to 10x faster than open-source tools.
      • Capabilities: Handles millions of small files; automatic encryption and data validation.
  • II. Database Migration (Structured)
    • AWS Database Migration Service (DMS)
      • Supports Relational (RDS, Aurora) and NoSQL (DynamoDB, MongoDB).
      • Modes: Full Load (one-time) vs. CDC (continuous).
    • AWS Schema Conversion Tool (SCT)
      • Used for heterogeneous migrations (e.g., Oracle to PostgreSQL).
  • III. Physical Migration (Large Scale)
    • AWS Snow Family
      • Snowcone: Compact, 8TB, for edge locations.
      • Snowball Edge: Storage-optimized (80TB) or Compute-optimized.

Visual Anchors

Migration Tool Decision Tree

Loading Diagram...

AWS Transfer Family Architecture

\begin{tikzpicture}[node distance=2cm, every node/.style={draw, fill=blue!10, text centered, minimum height=1em, rounded corners}] \node (client) [fill=gray!20] {External Clients (SFTP)}; \node (transfer) [right of=client, xshift=2cm] {AWS Transfer Family}; \node (s3) [above right of=transfer, xshift=2cm] {Amazon S3}; \node (efs) [below right of=transfer, xshift=2cm] {Amazon EFS}; \node (iam) [below of=transfer] {IAM Roles / Auth};

code
\draw[->, thick] (client) -- node[above] {Protocol} (transfer); \draw[->, thick] (transfer) -- (s3); \draw[->, thick] (transfer) -- (efs); \draw[dashed] (iam) -- (transfer);

\end{tikzpicture}

Definition-Example Pairs

  • Heterogeneous Migration: Migrating between different database engines.
    • Example: Moving an on-premises Oracle database to an Amazon Aurora (PostgreSQL) instance using SCT and DMS.
  • Agent-based Transfer: Using a lightweight software component to facilitate data movement.
    • Example: Installing an AWS DataSync Agent on a local VMware cluster to scan an NFS mount and push files to S3.
  • Offline Data Transfer: Shipping physical hardware to move data.
    • Example: Requesting a Snowball Edge to migrate a 500TB video archive where the local internet upload speed is only 100 Mbps.

Worked Examples

Example 1: Modernizing a Legacy File Exchange

Scenario: A financial firm uses SFTP to receive daily CSV files from partners. They want to process these in AWS Glue without requiring partners to change their software. Step-by-Step Solution:

  1. Create an AWS Transfer Family Server: Select the SFTP protocol.
  2. Map Users: Create IAM roles that allow the "Partner" user to write specifically to s3://firm-inbound/partner-id/.
  3. Endpoint: Provide the server's DNS endpoint to the partner.
  4. Integration: Configure an S3 Event Notification to trigger an AWS Lambda function or AWS Glue job whenever a new file is uploaded.

Example 2: Migrating a Production SQL Server

Scenario: A retail site needs to move its SQL Server to Amazon RDS with zero downtime. Step-by-Step Solution:

  1. SCT: Run the Schema Conversion Tool to create the target schema in RDS.
  2. DMS Replication Instance: Launch a DMS instance to handle the compute of the migration.
  3. Task Configuration: Set the migration type to "Full load and ongoing replication (CDC)".
  4. Cutover: Once the target is in sync, point the application to the new RDS endpoint.

Checkpoint Questions

  1. Which protocol(s) does the AWS Transfer Family support for file transfers?
  2. When should a data engineer choose AWS DataSync over AWS Transfer Family?
  3. What is the primary role of the AWS Schema Conversion Tool (SCT)?
  4. True or False: AWS DataSync can transfer data directly to Amazon EFS.
  5. How does the Snow Family handle security during physical transit?
Click to see answers
  1. SFTP, AS2, FTPS, and FTP.
  2. Choose DataSync for automated, high-speed bulk migrations between storage systems; choose Transfer Family for protocol compatibility (allowing external clients to connect via SFTP).
  3. To convert database schemas between different engines (heterogeneous migrations).
  4. True.
  5. Snow devices use tamper-evident seals and 256-bit encryption (KMS) to ensure data remains secure even if the device is intercepted.

Comparison Tables

ServicePrimary GoalData SourceLatency Focus
Transfer FamilyInterface CompatibilityExternal Clients (SFTP)User access latency
DataSyncBulk Online MigrationOn-prem Storage (NFS/SMB)Throughput/Speed
SnowballBulk Offline MigrationOn-prem StorageTotal Time (Transfer Days)
DMSDatabase ConsistencySQL/NoSQL DatabasesReplication Lag

Muddy Points & Cross-Refs

  • DataSync vs. Transfer Family: This is a frequent point of confusion. Remember: DataSync is a "Mover" (it initiates the pull/push), while Transfer Family is a "Server" (it waits for others to connect to it).
  • SCT Usage: You only need SCT if the source and target database engines are different. If you are moving SQL Server to RDS SQL Server, DMS handles the schema naturally.
  • Next Steps: To learn how to handle data after it is migrated, see the study guides on AWS Glue ETL and Amazon Redshift Ingestion (COPY command).

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free