BrainyBeeBrainyBee
ExploreBlogStart Studying
HomeAWS Certified Solutions Architect - Associate (SAA-C03)Hands-On Lab: Build a High-Performing Data Ingestion Pipeline with Kinesis Data Firehose
Hands-On Lab863 words

Hands-On Lab: Build a High-Performing Data Ingestion Pipeline with Kinesis Data Firehose

Determine high-performing data ingestion and transformation solutions

Hands-On Lab: Build a High-Performing Data Ingestion Pipeline

Welcome to this hands-on lab! Based on the AWS Certified Solutions Architect - Associate (SAA-C03) exam objectives, a core skill is determining high-performing data ingestion and transformation solutions. In this lab, you will build a scalable serverless ingestion pipeline using Amazon Kinesis Data Firehose to collect streaming data and deliver it securely to an Amazon S3 data lake.


Prerequisites

Before starting this lab, ensure you have the following ready:

  • AWS Account: Administrator access to an AWS account.
  • Command Line Tools: The AWS CLI (aws) installed and configured with your credentials.
  • IAM Permissions: Ability to create S3 buckets, Kinesis delivery streams, and IAM roles.
  • Knowledge: Basic understanding of JSON and streaming data concepts.

Learning Objectives

By completing this lab, you will be able to:

  1. Provision a centralized Amazon S3 bucket to act as the foundation of a data lake.
  2. Configure Amazon Kinesis Data Firehose to securely ingest streaming data.
  3. Establish IAM trust policies allowing AWS services to interact securely.
  4. Manually ingest mock telemetry data and verify its automated delivery.

Architecture Overview

This diagram illustrates the ingestion pipeline you will build. Data produced by the CLI is sent to Kinesis, which automatically buffers the stream and delivers it to S3.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create the Target S3 Data Lake Bucket

First, we need a highly durable storage destination for our ingested data.

bash
aws s3 mb s3://brainybee-data-lake-<YOUR_ACCOUNT_ID> --region <YOUR_REGION>

📸 Screenshot: Terminal output showing make_bucket: brainybee-data-lake-<YOUR_ACCOUNT_ID>

â–¶Console alternative
  1. Navigate to the Amazon S3 console.
  2. Click Create bucket.
  3. Enter the bucket name brainybee-data-lake-<YOUR_ACCOUNT_ID>.
  4. Select your preferred region.
  5. Leave all other settings as default and click Create bucket.

Step 2: Create an IAM Role for Kinesis Data Firehose

Kinesis needs permission to write data into your newly created S3 bucket. We will create an IAM role and attach a policy.

First, create the trust policy document:

bash
echo '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "firehose.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' > trust-policy.json

Create the role using the trust policy:

bash
aws iam create-role \ --role-name brainybee-firehose-s3-role \ --assume-role-policy-document file://trust-policy.json

Attach the necessary S3 access policy (Note: In production, use least-privilege scoping instead of Full Access):

bash
aws iam attach-role-policy \ --role-name brainybee-firehose-s3-role \ --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
â–¶Console alternative
  1. Navigate to the IAM console > Roles.
  2. Click Create role.
  3. Select AWS service as the trusted entity and choose Kinesis (then Kinesis Firehose).
  4. In permissions, attach AmazonS3FullAccess.
  5. Name the role brainybee-firehose-s3-role and click Create role.

Step 3: Create the Kinesis Data Firehose Delivery Stream

Now we provision the ingestion stream, instructing it to send data to our S3 bucket.

bash
aws firehose create-delivery-stream \ --delivery-stream-name brainybee-ingestion-stream \ --s3-destination-configuration RoleARN=arn:aws:iam::<YOUR_ACCOUNT_ID>:role/brainybee-firehose-role,BucketARN=arn:aws:s3:::brainybee-data-lake-<YOUR_ACCOUNT_ID>

[!TIP] By default, Kinesis Data Firehose buffers data for 5 minutes or 5 MB (whichever comes first) before delivering it to S3. This batching improves performance and reduces S3 API costs.

â–¶Console alternative
  1. Navigate to the Amazon Kinesis console.
  2. Select Data Firehose and click Create delivery stream.
  3. Source: Direct PUT / Destination: Amazon S3.
  4. Stream name: brainybee-ingestion-stream.
  5. Select your S3 bucket brainybee-data-lake-<YOUR_ACCOUNT_ID>.
  6. Under Advanced settings, ensure the newly created IAM role is selected.
  7. Click Create delivery stream.

Step 4: Ingest Streaming Data via CLI

Let's simulate a clickstream or IoT device sending telemetry data into our ingestion pipeline.

bash
aws firehose put-record \ --delivery-stream-name brainybee-ingestion-stream \ --record '{"Data":"eyJ1c2VySWQiOiAiMTIzNDUiLCAiYWN0aW9uIjogImxvZ2luIiwgInRpbWVzdGFtcCI6ICIyMDIzLTEwLTAxVDEyOjAwOjAwWiJ9Cg=="}'

[!NOTE] The Data payload must be Base64 encoded. The string above decodes to {"userId": "12345", "action": "login", "timestamp": "2023-10-01T12:00:00Z"}.

Execute the command 3-5 times to simulate multiple incoming records.

📸 Screenshot: Terminal showing the RecordId confirmation from AWS.

â–¶Console alternative
  1. Kinesis Data Firehose currently requires data injection via CLI, SDKs, or agents (like Kinesis Agent). There is no direct "Test Event" button in the Firehose console. Please use the CLI method above or the provided Python boto3 SDK.

Step 5: Verify Data Delivery in S3

Wait approximately 5 minutes for the buffer interval to complete, then check your S3 bucket for the delivered files.

bash
aws s3 ls s3://brainybee-data-lake-<YOUR_ACCOUNT_ID>/ --recursive
â–¶Console alternative
  1. Navigate to the Amazon S3 console.
  2. Open brainybee-data-lake-<YOUR_ACCOUNT_ID>.
  3. Navigate through the automatically generated YYYY/MM/DD/HH folder structure.
  4. Download the object and open it in a text editor to verify the ingested JSON data.

Checkpoints

Verify your progress by running these commands:

  • Checkpoint 1 (After Step 3): Run aws firehose describe-delivery-stream --delivery-stream-name brainybee-ingestion-stream. Ensure DeliveryStreamStatus shows as ACTIVE.
  • Checkpoint 2 (After Step 5): Run aws s3 ls s3://brainybee-data-lake-<YOUR_ACCOUNT_ID>/ --recursive. You should see at least one file path resembling 2023/10/01/12/brainybee-ingestion-stream-1-2023-10-01-12....

Troubleshooting

IssueProbable CauseSolution
ResourceNotFoundException during put-recordStream is still in the CREATING state.Wait 1-2 minutes and check stream status before retrying.
AccessDenied when creating streamIAM Role missing S3 permissions or trust relationship is incorrect.Verify trust-policy.json contains firehose.amazonaws.com as the principal.
No data appearing in S3Buffer time hasn't elapsed.Wait a full 5 minutes for the Firehose buffer to flush to the S3 bucket.

Clean-Up / Teardown

[!WARNING] Remember to run the teardown commands to avoid ongoing charges. While S3 storage is cheap, leaving idle resources is bad practice.

Execute the following commands to destroy all provisioned resources:

  1. Delete the Firehose Delivery Stream:

    bash
    aws firehose delete-delivery-stream --delivery-stream-name brainybee-ingestion-stream
  2. Empty and Delete the S3 Bucket:

    bash
    aws s3 rm s3://brainybee-data-lake-<YOUR_ACCOUNT_ID> --recursive aws s3 rb s3://brainybee-data-lake-<YOUR_ACCOUNT_ID>
  3. Detach Policy and Delete IAM Role:

    bash
    aws iam detach-role-policy --role-name brainybee-firehose-s3-role --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess aws iam delete-role --role-name brainybee-firehose-s3-role

Lab complete! You have successfully implemented a serverless, highly scalable data ingestion pipeline using AWS purpose-built services.

All AWS Certified Solutions Architect - Associate (SAA-C03) Study Resources

Related Notes

  • AWS SAA-C03: High-Performing Data Ingestion and Transformation1,084 words
  • AWS S3 Access Options and Cost Optimization945 words
  • Mastering AWS Compliance: Aligning Technology with Regulatory Standards920 words
  • Mastering API Management: Amazon API Gateway and RESTful Architectures895 words
  • Secure Application Configuration and Credentials Management1,240 words
  • AWS Compute Services: Strategic Selection & Use Cases920 words
  • AWS Cost Management and Multi-Account Billing: A Comprehensive Study Guide925 words
  • AWS Cost Management and Multi-Account Billing Strategy845 words
  • AWS Cost Management and Optimization Study Guide820 words
  • AWS Cost Management: Tracking, Tagging, and Multi-Account Billing820 words
  • AWS Cost Management and Optimization Study Guide920 words
  • AWS Cost Management and Optimization Tools945 words

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up.

Start Studying

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free
AWS Certified Solutions Architect - Associate (SAA-C03) ResourcesExplore All HivesBlogHome

© 2026 BrainyBee. Free AI-powered exam prep.