Study Guide1,056 words

AWS Data Transfer Modeling and Cost Optimization

Performing data transfer modeling and selecting services to reduce data transfer costs

AWS Data Transfer Modeling and Cost Optimization

This guide focuses on the critical architectural skill of modeling data transfer costs and selecting the most cost-effective AWS services for moving data into, out of, and within the AWS ecosystem. Understanding the "hidden" costs of data egress and regional transfers is essential for passing the SAP-C02 exam.

Learning Objectives

After studying this guide, you should be able to:

  • Model data transfer costs based on volume, frequency, and source/destination types.
  • Differentiate between online and offline data transfer mechanisms (Snow Family vs. DataSync).
  • Select appropriate services (e.g., S3 Transfer Acceleration, Direct Connect) to optimize for both performance and cost.
  • Implement architectural patterns that minimize data egress charges, such as using CloudFront or VPC Endpoints.

Key Terms & Glossary

  • Data Egress: Data leaving the AWS network to the internet or on-premises environments. This is the primary driver of data transfer costs.
  • Data Ingress: Data entering the AWS network. Generally, ingress is free.
  • VPC Endpoint: A private connection between your VPC and supported AWS services (S3, DynamoDB) that avoids data traveling over the public internet.
  • Edge Location: A site used by CloudFront (CDN) to cache content closer to users, often reducing egress costs through specific pricing models.
  • Hydration: The process of initially loading a large dataset into a cloud storage service like Amazon S3.

The "Big Idea"

In AWS, compute is cheap, but moving data is expensive. Architectural efficiency is often defined by how little data needs to move across network boundaries. Data transfer modeling isn't just about picking a tool; it's about understanding that the path the data takes (Public Internet vs. Private Peering vs. Internal Backbone) dictates the final bill. The goal is to keep data "local" to the region or the AWS backbone as long as possible.

Formula / Concept Box

FactorPricing Logic
Inbound (Ingress)$0.00 per GB (Free)
Inter-AZ TransferCharged per GB (both directions) — typically $0.01/GB
Inter-Region TransferCharged at standard data transfer rates (varies by region)
Outbound (Egress)Tiered pricing (starts at ~$0.09/GB for the first 10TB)
Direct ConnectReduced egress rates compared to Internet-based transfer

Hierarchical Outline

  • I. Cost Modeling Fundamentals
    • Distance/Boundaries: Data crossing an AZ boundary, Region boundary, or AWS Network boundary.
    • Volume vs. Speed: Large one-time migrations (Snowball) vs. continuous small streams (Kinesis).
  • II. Online Transfer Services
    • AWS DataSync: Automated, high-speed transfer over the internet or Direct Connect.
    • AWS Transfer Family: Managed SFTP, FTPS, and FTP directly into S3 or EFS.
    • S3 Transfer Acceleration: Uses CloudFront edge locations for faster uploads over long distances.
  • III. Offline Transfer Services (Snow Family)
    • Snowcone: 8TB usable; small, portable; for edge computing and light migration.
    • Snowball Edge: 80TB+ usable; optimized for large-scale data collection and migration.
    • Snowmobile: Exabyte-scale; 100PB per truck; for massive data center evacuations.
  • IV. Cost Reduction Strategies
    • VPC Endpoints (Interface & Gateway): Keeps traffic off the public internet.
    • CloudFront: Reduces costs for high-request content compared to direct S3 egress.

Visual Anchors

Migration Decision Flow

Loading Diagram...

Data Transfer Cost Boundaries

\begin{tikzpicture}[node distance=2cm, every node/.style={draw, thick, rectangle, rounded corners, inner sep=5pt}] \node (User) [fill=blue!10] {Internet / User}; \node (VPC1) [right=of User, fill=green!10] {VPC (Region A)}; \node (VPC2) [below=of VPC1, fill=green!10] {VPC (Region B)};

code
\draw [<->, thick] (User) -- node[above, draw=none] {\tiny \dlr \dlr \dlr Egress} (VPC1); \draw [<->, thick] (VPC1) -- node[right, draw=none] {\tiny \dlr \dlr Inter-Region} (VPC2); \node (AZ1) [right=0.5cm of VPC1, draw=none] {\tiny AZ1}; \node (AZ2) [right=2.5cm of VPC1, draw=none] {\tiny AZ2}; \draw [<->, dashed] (2.5, 0.5) -- (5.5, 0.5) node[midway, above, draw=none] {\tiny \dlr Inter-AZ};

\end{tikzpicture}

Definition-Example Pairs

  • AWS DataSync: A service used to automate and accelerate moving data between on-premises storage and AWS.
    • Example: Syncing a 50TB on-premises NAS to Amazon EFS every night during a migration window.
  • S3 Transfer Acceleration: A bucket-level feature that enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
    • Example: A centralized bucket in US-East-1 receiving 5GB video uploads from field reporters in Tokyo and London.
  • AWS Storage Gateway: A hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage.
    • Example: Using a Tape Gateway to replace a physical tape backup library with S3 Glacier without changing the backup software.

Worked Examples

Problem: Choosing a Migration Path

Scenario: A company needs to migrate 80 TB of data from their on-premises data center to Amazon S3. They have a 100 Mbps dedicated internet connection available for migration.

Calculation:

  1. Time via Internet: $80,000,000 MB / (100 Mbps / 8) \approx 6,400,000 seconds \approx 74 days$.
  2. Constraints: The migration must be completed in 10 days.
  3. Cost Check: Snowball Edge (80TB) costs a flat fee plus shipping. Internet egress from on-prem is free, but the time cost exceeds the window.

Solution: Use AWS Snowball Edge. It can be shipped, loaded, and returned within approximately 7-10 days, meeting the deadline and likely costing less than keeping a dedicated line saturated for two months.

Checkpoint Questions

  1. True or False: Data transferred from an EC2 instance to an S3 bucket in the same region using a Gateway VPC Endpoint is free.
  2. Which service is best for continuous, real-time streaming of IoT data into a Data Lake?
  3. At what data volume threshold does the AWS documentation suggest Snowball becomes more cost-effective than online transfer?
  4. What is the main cost benefit of using AWS Direct Connect for data egress?
Click to see answers
  1. True. Traffic stays within the AWS network and avoids both NAT Gateway and Egress charges.
  2. Amazon Kinesis Data Firehose.
  3. 10 TB. (Below this, the overhead of shipping/ordering Snowball usually outweighs the bandwidth cost).
  4. Reduced data transfer rates. Egress via Direct Connect is significantly cheaper (often \dlr 0.02/GB) compared to the public internet (~\dlr 0.09/GB).

Muddy Points & Cross-Refs

  • S3 Transfer Acceleration vs. CloudFront: Users often confuse these. CloudFront is for downloading (content delivery) to many users. Transfer Acceleration is for uploading to an S3 bucket from a distance. Both use edge locations.
  • Direct Connect vs. VPN: While both provide connectivity, Direct Connect provides a consistent, lower-cost egress rate. A Site-to-Site VPN still travels over the internet and is subject to standard internet egress pricing.
  • Storage Gateway Types: Remember that S3 File Gateway is for file-to-object mapping, while Volume Gateway is for block storage (iSCSI).

Comparison Tables

Migration Service Comparison

ServiceModeBest ForLatency/Speed
DataSyncOnlineOngoing sync / 10TB-50TBNetwork Dependent
Snowball EdgeOfflineMassive one-time migrationsPhysical Shipping
Transfer FamilyOnlineClient-facing SFTP workflowsNetwork Dependent
Kinesis FirehoseOnlineReal-time streaming/ETLNear Real-time
S3 Transfer AccelOnlineGlobal uploads to one bucketHigh (Accelerated)

Ready to study AWS Certified Solutions Architect - Professional (SAP-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free