Hands-On Lab985 words

Lab: Building a Serverless Data Processor with AWS Lambda and Python

Programming Concepts

Lab: Building a Serverless Data Processor with AWS Lambda and Python

This lab focuses on Domain 1.4 (Apply Programming Concepts) of the AWS Certified Data Engineer - Associate (DEA-C01) exam. You will build a serverless data pipeline that automates the transformation of raw CSV data into a structured JSON format using Python, S3 event triggers, and CloudWatch monitoring.

Prerequisites

To complete this lab, you need:

  • An AWS Account with administrative privileges.
  • AWS CLI installed and configured (aws configure).
  • A text editor (VS Code, Sublime, or similar).
  • Python 3.9+ installed locally for packaging.
  • Basic familiarity with Python and the boto3 library.

Learning Objectives

  • Deploy a Lambda Function using the AWS CLI and verify its performance configuration.
  • Implement Data Transformation logic to convert CSV to JSON within a memory-constrained environment.
  • Configure Event Triggers to automate processing upon S3 object uploads.
  • Apply Best Practices for logging and monitoring using Amazon CloudWatch Logs.

Architecture Overview

The following diagram illustrates the flow of data from ingestion to monitoring.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create the Data Lake Storage

We need a bucket to host our incoming data and the processed results.

bash
# Replace <YOUR_UNIQUE_SUFFIX> with your name or a random number aws s3 mb s3://brainybee-lab-data-<YOUR_UNIQUE_SUFFIX>
Console alternative
  1. Log in to the S3 Console.
  2. Click Create bucket.
  3. Name: brainybee-lab-data-<YOUR_UNIQUE_SUFFIX>.
  4. Keep default settings and click Create bucket.

Step 2: Create the IAM Execution Role

Lambda requires permission to read from S3 and write logs to CloudWatch.

  1. Save the following as trust-policy.json:
json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
  1. Create the role:
bash
aws iam create-role --role-name BrainyBeeLambdaRole --assume-role-policy-document file://trust-policy.json # Attach Managed Policy for S3 access and Logs aws iam attach-role-policy --role-name BrainyBeeLambdaRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess aws iam attach-role-policy --role-name BrainyBeeLambdaRole --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Step 3: Write and Package the Lambda Code

Create a file named lambda_function.py. This script applies programming concepts like optimizing runtime and logging.

python
import json import boto3 import csv import io import time s3 = boto3.client('s3') def lambda_handler(event, context): start_time = time.time() # Get bucket and file name from the event bucket = event['Records'][0]['s3']['bucket']['name'] key = event['Records'][0]['s3']['object']['key'] print(f"Processing file: {key} from bucket: {bucket}") # Read CSV response = s3.get_object(Bucket=bucket, Key=key) content = response['Body'].read().decode('utf-8') # Transform CSV to JSON reader = csv.DictReader(io.StringIO(content)) json_data = json.dumps([row for row in reader]) # Save to S3 (in the /processed folder) output_key = f"processed/{key.replace('.csv', '.json')}" s3.put_object(Bucket=bucket, Key=output_key, Body=json_data) duration = time.time() - start_time print(f"Transformation complete in {duration:.2f} seconds.") return { 'statusCode': 200, 'body': json.dumps('Success!') }

Package the function:

bash
zip function.zip lambda_function.py

Step 4: Deploy the Lambda Function

We will configure the function with 128MB of memory, a standard for lightweight ETL scripts.

bash
aws lambda create-function --function-name DataProcessor \ --zip-file fileb://function.zip --handler lambda_function.lambda_handler --runtime python3.9 \ --role arn:aws:iam::<YOUR_ACCOUNT_ID>:role/BrainyBeeLambdaRole

[!TIP] Use aws sts get-caller-identity --query Account --output text to find your Account ID.

Step 5: Configure S3 Trigger

  1. Grant Permission to S3 to invoke the Lambda:
bash
aws lambda add-permission --function-name DataProcessor --statement-id s3-trigger \ --action "lambda:InvokeFunction" --principal s3.amazonaws.com \ --source-arn arn:aws:s3:::brainybee-lab-data-<YOUR_UNIQUE_SUFFIX>
  1. Configure the Bucket Notification: Save as notification.json:
json
{ "LambdaFunctionConfigurations": [ { "LambdaFunctionArn": "arn:aws:lambda:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:function:DataProcessor", "Events": ["s3:ObjectCreated:*"], "Filter": { "Key": { "FilterRules": [{ "Name": "suffix", "Value": ".csv" }] } } } ] }
bash
aws s3api put-bucket-notification-configuration --bucket brainybee-lab-data-<YOUR_UNIQUE_SUFFIX> \ --notification-configuration file://notification.json

Checkpoints

Verification StepCommand / ActionExpected Result
Function Statusaws lambda get-function --function-name DataProcessorState should be Active.
Data IngestionUpload a sample CSV to S3.File appears in bucket root.
ExecutionCheck /processed/ folder in S3.A .json version of your file exists.
LoggingView CloudWatch Log Groups for /aws/lambda/DataProcessor.Logs show "Transformation complete..." with timing.

Teardown

[!WARNING] Remember to run these commands to avoid ongoing charges for storage and logging.

bash
# 1. Delete Lambda aws lambda delete-function --function-name DataProcessor # 2. Empty and Delete S3 Bucket aws s3 rm s3://brainybee-lab-data-<YOUR_UNIQUE_SUFFIX> --recursive aws s3 rb s3://brainybee-lab-data-<YOUR_UNIQUE_SUFFIX> # 3. Delete IAM Role aws iam detach-role-policy --role-name BrainyBeeLambdaRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess aws iam detach-role-policy --role-name BrainyBeeLambdaRole --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole aws iam delete-role --role-name BrainyBeeLambdaRole

Troubleshooting

ErrorCauseFix
AccessDeniedIAM role lacks S3 permissions.Re-run attach-role-policy commands in Step 2.
ModuleNotFoundErrorCode structure in ZIP is incorrect.Ensure lambda_function.py is at the root of the ZIP.
MemorySizeExceededCSV file is too large for 128MB.Use aws lambda update-function-configuration to increase memory.

Cost Estimate

  • AWS Lambda: First 1 million requests per month are free (Free Tier).
  • Amazon S3: ~$0.023 per GB (Standard). In this lab, cost will be < $0.01.
  • CloudWatch: Free up to 5GB of log data.

Stretch Challenge

Task: Optimize the code for Concurrency. Modify the Lambda function configuration to set a Reserved Concurrency of 5. This prevents a sudden burst of S3 uploads from consuming all available Lambda execution slots in your account, which is a key DEA-C01 skill (Skill 1.4.2).

Show CLI Command
bash
aws lambda put-function-concurrency --function-name DataProcessor --reserved-concurrent-executions 5

Concept Review

In this lab, we applied several core programming concepts required for the DEA-C01 exam:

Lambda Execution Lifecycle

This TikZ diagram visualizes the phases of a Lambda execution environment.

\begin{tikzpicture}[node distance=1.5cm, every node/.style={draw, rectangle, rounded corners, fill=blue!10, text width=2.5cm, align=center}] \node (init) {\textbf{Init Phase}\Extensions, Runtime, Function code}; \node (invoke) [right=of init] {\textbf{Invoke Phase}\Handler execution, Event processing}; \node (shutdown) [right=of invoke] {\textbf{Shutdown}\Runtime cleanup};

code
\draw[->, thick] (init) -- (invoke); \draw[->, thick] (invoke) -- (shutdown); \node[draw=none, fill=none, below=0.1cm of init] (cold) {\textit{Cold Start}}; \node[draw=none, fill=none, below=0.1cm of invoke] (warm) {\textit{Warm Start}};

\end{tikzpicture}

Key Terms

  • Distributed Computing: The use of multiple compute resources (like Lambda instances) to process data in parallel.
  • Event-Driven Architecture: A system where actions (Lambda) are triggered by events (S3 upload) rather than polling.
  • IaC (Infrastructure as Code): While we used CLI, these steps are typically automated via AWS SAM or CDK in production to ensure repeatability.

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free