BrainyBeeBrainyBee
ExploreBlogStart Studying
HomeAWS Certified Solutions Architect - Associate (SAA-C03)Selecting Appropriate Ingestion Configurations
Study Guide0 words

Selecting Appropriate Ingestion Configurations

Selecting appropriate configurations for ingestion

Selecting Appropriate Ingestion Configurations

This guide covers the critical decision-making process for ingesting data into AWS. It focuses on selecting the right service based on data type (streaming vs. batch), scale, and processing requirements, aligned with the AWS Certified Solutions Architect - Associate (SAA-C03) exam.

Learning Objectives

  • Distinguish between Kinesis Data Streams and Kinesis Data Firehose for various use cases.
  • Evaluate the trade-offs between Amazon SQS and Amazon Kinesis for data ingestion.
  • Identify appropriate tools for on-premises to cloud data migration (DataSync vs. Storage Gateway).
  • Understand the role of AWS Glue and Lake Formation in creating structured data lakes.

Key Terms & Glossary

  • Ingestion: The process of collecting and moving data from various sources into a storage or processing system (e.g., S3 or Redshift).
  • Shard: A unit of throughput capacity in Kinesis Data Streams. Each shard provides a fixed amount of resources.
  • Fan-out: The ability for multiple consumers to read from the same data stream concurrently.
  • Producer: An application or device that sends data to an ingestion service.
  • Consumer: A service or application that processes data delivered by an ingestion service.
  • ETL (Extract, Transform, Load): A three-step process where data is taken from a source, changed into a suitable format, and placed in a destination.

The "Big Idea"

Data ingestion is the "front door" of any data architecture. Choosing the wrong configuration leads to bottlenecks, data loss, or excessive costs. The fundamental choice revolves around Latency vs. Management: Do you need sub-second real-time processing (Kinesis Data Streams), or do you want a fully managed delivery service that can transform data before it hits the disk (Kinesis Data Firehose)?

Formula / Concept Box

FeatureKinesis Data Streams (KDS)Kinesis Data Firehose (KDF)
ManagementProvisioned (Manual Sharding)Fully Managed (Automatic)
LatencyReal-time (< 200ms)Near real-time (60s buffer minimum)
Data Retention24 hours to 365 daysNo retention (transient)
ConsumersMultiple (Fan-out)Single Destination
TransformationRequires custom codeIntegrated via AWS Lambda

[!IMPORTANT] Shard Limits:

  • Write: 1,000 records/sec OR 1 MB/sec per shard.
  • Read: 5 transactions/sec OR 2 MB/sec per shard.

Hierarchical Outline

  1. Streaming Data Ingestion
    • Amazon Kinesis Data Streams: Used for custom real-time applications; requires manual shard management.
    • Amazon Kinesis Data Firehose: Simple loading of streaming data into S3, Redshift, OpenSearch, or Splunk.
    • Amazon Kinesis Video Streams: Specifically for binary/video data ingestion for ML/analytics.
  2. Hybrid and Bulk Ingestion
    • AWS DataSync: Fast data transfer for large-scale migrations from on-premises to S3 or EFS.
    • AWS Storage Gateway: Hybrid storage that allows on-premises apps to use AWS storage via standard protocols (iSCSI, NFS).
  3. Managed Data Lakes
    • AWS Lake Formation: Simplifies the setup of a secure data lake; orchestrates AWS Glue for ingestion and ETL.
    • AWS Glue: Serverless ETL service that categorizes data and cleans it via the Data Catalog.

Visual Anchors

Ingestion Decision Flow

Loading Diagram...

Shard Architecture Visualization

Compiling TikZ diagram…
⏳
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Deduplication (FindMatches ML): Using Machine Learning to identify duplicate records in a data lake that lack a common unique key.
    • Example: A company merges customer databases where
All AWS Certified Solutions Architect - Associate (SAA-C03) Study Resources

Related Notes

  • AWS S3 Access Options and Cost Optimization945 words
  • Mastering AWS Compliance: Aligning Technology with Regulatory Standards920 words
  • Mastering API Management: Amazon API Gateway and RESTful Architectures895 words
  • Secure Application Configuration and Credentials Management1,240 words
  • AWS Compute Services: Strategic Selection & Use Cases920 words
  • AWS Cost Management and Multi-Account Billing: A Comprehensive Study Guide925 words
  • AWS Cost Management and Multi-Account Billing Strategy845 words
  • AWS Cost Management and Optimization Study Guide820 words
  • AWS Cost Management: Tracking, Tagging, and Multi-Account Billing820 words
  • AWS Cost Management and Optimization Study Guide920 words
  • AWS Cost Management and Optimization Tools945 words
  • AWS Cost Management Tools: Appropriate Use Cases and Strategies845 words

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up.

Start Studying

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free
AWS Certified Solutions Architect - Associate (SAA-C03) ResourcesExplore All HivesBlogHome

© 2026 BrainyBee. Free AI-powered exam prep.