Study Guide880 words

AWS Streaming Data Services: Amazon Kinesis Study Guide

Streaming data services with appropriate use cases (for example, Amazon Kinesis)

AWS Streaming Data Services: Amazon Kinesis Study Guide

This guide covers the core streaming data services in AWS, focusing on the Amazon Kinesis ecosystem. Understanding when to use each service and how they handle data flow is critical for the AWS Certified Solutions Architect Associate exam.

Learning Objectives

By the end of this guide, you will be able to:

  • Differentiate between Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Video Streams.
  • Identify appropriate use cases for streaming data vs. message queuing (SQS).
  • Select the correct ingestion method (Kinesis Agent vs. KPL) based on effort and technical requirements.
  • Map specific AWS destinations (S3, Redshift, OpenSearch) to the correct streaming service.

Key Terms & Glossary

  • Producer: The data source that sends records into a Kinesis stream (e.g., EC2 instances running logs, IoT sensors).
  • Consumer: The application or service that retrieves and processes data from the stream (e.g., Lambda, EC2 applications, Kinesis Data Analytics).
  • Shard: A uniquely identified sequence of data records in a stream; the base unit of throughput and scaling for Kinesis Data Streams.
  • Partition Key: Used to group data by shard within a stream. Data with the same partition key is sent to the same shard to ensure ordering.
  • Sequence Number: A unique identifier assigned by Kinesis to each data record when it is ingested.

The "Big Idea"

[!IMPORTANT] The fundamental shift here is moving from batch processing (handling data at rest) to stream processing (handling data in motion). Kinesis allows you to analyze and respond to data as it arrives, rather than waiting for it to be stored in a database or file system.

Formula / Concept Box

Comparison Table: Kinesis vs. SQS

FeatureSQS (Standard)Kinesis Data StreamsKinesis Data Firehose
ModelProducer-Consumer (Pull)Producer-Consumer (Push/Pull)Source-Destination (Delivery)
Max Retention14 Days7 Days (up to 365 with Long-Term)24 Hours (Buffer period)
OrderingBest-effort (unless FIFO)Guaranteed per ShardNo guaranteed order
ConsumersSingle consumer (message deleted)Multiple consumers per streamFixed destinations (S3, Redshift, etc.)
Data TransformationNoNo (requires external consumer)Yes (via AWS Lambda)

Hierarchical Outline

  1. Amazon Kinesis Overview
    • Real-time processing of binary data (audio, video, logs, clickstreams).
    • Scalable to handle gigabytes of data per second from thousands of sources.
  2. Kinesis Data Streams (KDS)
    • Architecture: Built on Shards. You manage scaling by adding/removing shards.
    • Use Case: Custom real-time applications where multiple consumers need the same data.
    • Ingestion: Amazon Kinesis Agent (Linux logs) or Kinesis Producer Library (KPL).
  3. Kinesis Data Firehose (KDF)
    • Architecture: Fully managed; no shards to manage. It is a Delivery Stream.
    • Destinations: S3, Redshift, OpenSearch, Splunk, HTTP Endpoints.
    • Transformation: Can call a Lambda function to transform data (e.g., CSV to Parquet) before delivery.
  4. Kinesis Video Streams (KVS)
    • Architecture: Time-indexed data storage.
    • Protocols: Supports HLS, DASH, and WebRTC for peer-to-peer.
    • Use Case: Security feeds, facial recognition with Rekognition, baby monitors.

Visual Anchors

Typical Kinesis Architecture

Loading Diagram...

Data Flow Logic

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Kinesis Data Firehose: A managed service that delivers streaming data to destinations.
    • Example: A web store streams clickstream data directly into an Amazon S3 bucket for long-term storage without writing any consumer code.
  • Amazon Kinesis Agent: A standalone Java application that collects and sends data to Kinesis.
    • Example: Installing the agent on 500 Linux web servers to automatically monitor log files and push them into Kinesis Data Streams.
  • WebRTC: A protocol for real-time communication.
    • Example: Using Kinesis Video Streams to power a two-way video intercom system for a smart doorbell.

Worked Examples

Scenario 1: Real-time Log Analytics for Redshift

Problem: A company wants to capture logs from thousands of EC2 instances and store them in Amazon Redshift for SQL analysis. The data must be converted from CSV to JSON during the process.

Solution:

  1. Install Amazon Kinesis Agent on the EC2 instances.
  2. Stream the logs to Kinesis Data Firehose.
  3. Configure KDF to trigger an AWS Lambda function for the CSV-to-JSON transformation.
  4. Set the destination of the KDF delivery stream to Amazon Redshift.

Scenario 2: High-Retention Stream

Problem: You are streaming financial data and need to allow consumers to "re-play" the data from up to 30 days ago.

Solution:

  • By default, Kinesis Data Streams retention is 24 hours (up to 7 days typically).
  • To achieve 30 days, you must specify an S3 bucket as a destination via Kinesis Data Firehose to create a permanent archive, as Kinesis itself does not naturally hold data for 30 days in the stream buffer without specific configuration updates.

Checkpoint Questions

  1. Which Kinesis service is best suited for streaming data to a custom-built application running on EC2?
  2. True or False: Kinesis Data Firehose requires you to manage shards for scaling.
  3. Which protocol does Kinesis Video Streams use for peer-to-peer videoconferencing?
  4. If you need to transform data format (e.g., CSV to Parquet) while it is in transit to S3, which service should you use?
  5. What is the main difference between KDS and SQS regarding multiple consumers?
Click to see Answers
  1. Kinesis Data Streams (KDS).
  2. False. Firehose is a fully managed "Source-Destination" service; Data Streams uses shards.
  3. WebRTC.
  4. Kinesis Data Firehose (integrated with Lambda).
  5. In SQS, a message is usually processed by one consumer and deleted. In KDS, multiple consumers can read the same record simultaneously without deleting it.

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free