Study Guide820 words

Mastering AWS Data Transfer Solutions: SAA-C03 Study Guide

Designing data transfer solutions

Mastering AWS Data Transfer Solutions: SAA-C03 Study Guide

Learning Objectives

After studying this guide, you should be able to:

  • Select the appropriate AWS Snow Family device based on data volume and compute requirements.
  • Differentiate between AWS DataSync and AWS Transfer Family for online data migrations.
  • Design data streaming architectures using Amazon Kinesis Data Streams and Data Firehose.
  • Evaluate the cost-effectiveness of physical migration versus over-the-wire transfer.
  • Understand the role of AWS Glue in transforming data during the ingestion process.

Key Terms & Glossary

  • ETL (Extract, Transform, Load): The process of gathering data from various sources, changing its format, and loading it into a destination (e.g., AWS Glue).
  • Hydration: The process of initially loading a large amount of data into a storage system or data lake.
  • Point-in-Time Migration: A one-time movement of data, typically using physical devices like Snowball.
  • Streaming Data: Data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously (e.g., application logs).
  • VPC Endpoint: A private connection between your VPC and supported AWS services without requiring an internet gateway.

The "Big Idea"

Data transfer in AWS is not a "one-size-fits-all" task. It is a balancing act between Volume, Velocity, and Cost. When moving petabytes of data, physical physics (bandwidth limits) often makes shipping hardware faster than using the internet. Conversely, for continuous, small-scale updates, automated managed services provide the efficiency needed for high-performing architectures.

Formula / Concept Box

ConceptMetric / Formula / RuleUse Case
Transfer TimeTime=Data SizeAvailable Bandwidth\text{Time} = \frac{\text{Data Size}}{\text{Available Bandwidth}}Deciding between Snowball vs. Direct Connect
Snowball Edge Storage80 TB UsableLarge scale data migration
Snowcone Storage22 TB UsableSmall/Edge location migration
Kinesis Data StreamsReal-time (70ms latency)High-performance analytics
Kinesis FirehoseNear real-time (60s+ latency)Loading data into S3/Redshift

Hierarchical Outline

  • I. Physical Migration (AWS Snow Family)
    • Snowcone: Ultra-portable, 22TB storage, 4 vCPUs.
    • Snowball Edge:
      • Storage Optimized: 80TB storage, 40 vCPUs.
      • Compute Optimized: 42TB storage, 52 vCPUs (for Edge ML/Processing).
    • Snowmobile: 45ft shipping container, up to 100PB per truck.
  • II. Online Data Transfer
    • AWS DataSync: Automates moving data between on-premises storage and AWS (S3, EFS, FSx).
    • AWS Transfer Family: Managed support for SFTP, FTPS, and FTP.
    • Amazon S3 Transfer Acceleration: Uses CloudFront’s edge locations to speed up long-distance uploads.
  • III. Data Ingestion & Transformation
    • Amazon Kinesis: Handles streaming data (Video, Data, Firehose).
    • AWS Glue: Serverless ETL for transforming data (e.g., CSV to Parquet).

Visual Anchors

Migration Decision Logic

Loading Diagram...

DataSync Architecture

\begin{tikzpicture}[node distance=2cm] \draw[thick] (0,0) rectangle (2.5,1.5) node[midway] {\begin{tabular}{c} On-Prem \ Storage \end{tabular}}; \draw[->, thick] (2.5,0.75) -- (4,0.75) node[midway, above] {Agent}; \draw[thick, dashed] (4,-0.5) rectangle (7,2.5) node[at start, below right] {AWS Cloud}; \draw[thick] (4.5,0.75) circle (0.5cm) node {Sync}; \draw[->, thick] (5,0.75) -- (6,0.75); \node at (6.5,0.75) [draw] {S3 / EFS}; \end{tikzpicture}

Definition-Example Pairs

  • AWS DataSync
    • Definition: An online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services.
    • Example: A hospital needs to sync 500GB of daily medical imaging from their local NAS to an Amazon S3 bucket every night.
  • Kinesis Data Firehose
    • Definition: An extract-transform-load (ETL) service that reliably captures, transforms, and delivers streaming data to data lakes, data stores, and analytics tools.
    • Example: A gaming company streaming player clickstream data directly into an S3 bucket to be analyzed later by Amazon Athena.

Worked Examples

Scenario: The Bandwidth Trap

Problem: A company has 100 TB of data to move to AWS. They have a dedicated 100 Mbps internet connection available for this task. Should they use the internet or a Snowball Edge?

Calculation:

  1. Total Data in bits: 100 TB×1024×1024×1024×1024×8879,609,302,220,800 bits100\text{ TB} \times 1024 \times 1024 \times 1024 \times 1024 \times 8 \approx 879,609,302,220,800\text{ bits}.
  2. Speed in bits per second: 100 Mbps=100,000,000 bps100\text{ Mbps} = 100,000,000\text{ bps}.
  3. Total Seconds: 8,796,093 seconds\approx 8,796,093\text{ seconds}.
  4. Total Days: 8,796,09386,400101.8 days\frac{8,796,093}{86,400} \approx 101.8\text{ days}.

Solution: Since 101 days is likely unacceptable for a business migration, the company should order two Snowball Edge Storage Optimized devices (80TB usable each) to complete the transfer in approximately 1–2 weeks (including shipping time).

Checkpoint Questions

  1. Which Snow Family device is specifically designed for compute-heavy workloads at the edge?
  2. What is the primary difference between Kinesis Data Streams and Kinesis Data Firehose regarding data retention?
  3. True or False: AWS Transfer Family supports the SMB protocol for file transfers.
  4. Why would a company choose AWS Glue during a data migration?
Click to see answers
  1. AWS Snowball Edge Compute Optimized.
  2. Kinesis Data Streams stores data (1-365 days) for multiple consumers; Firehose is for delivery to a destination and does not store data itself for replay.
  3. False. It supports SFTP, FTPS, and FTP.
  4. To transform data formats (e.g., CSV to Parquet) or clean data before it reaches the data lake.

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free