Hands-On Lab948 words

Lab: Monitoring and Auditing AWS Data Pipelines

Maintaining and Monitoring Data Pipelines

Lab: Monitoring and Auditing AWS Data Pipelines

This hands-on lab guides you through implementing a robust monitoring and alerting solution for a serverless data pipeline. You will learn to capture logs, create metric filters, and automate notifications when failures occur.

Prerequisites

  • An active AWS Account.
  • AWS CLI installed and configured with Administrator access.
  • Basic knowledge of Python and SQL.
  • Familiarity with the AWS Management Console.

[!IMPORTANT] Ensure your CLI is configured for a specific region (e.g., us-east-1) and use that region consistently throughout the lab.

Learning Objectives

  • Configure Amazon CloudWatch Logs to centralize pipeline execution data.
  • Create a CloudWatch Metric Filter to detect specific error patterns (e.g., "ERROR").
  • Set up Amazon SNS for automated real-time alerts.
  • Utilize CloudWatch Logs Insights to perform log analysis for auditing.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an SNS Topic and Subscription

You need a notification channel to receive alerts when your pipeline fails.

bash
# Create the SNS Topic aws sns create-topic --name brainybee-pipeline-alerts # Subscribe your email (Replace YOUR_EMAIL) aws sns subscribe --topic-arn arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts --protocol email --notification-endpoint <YOUR_EMAIL>
Console alternative
  1. Navigate to SNS > Topics > Create topic.
  2. Name it brainybee-pipeline-alerts.
  3. In the topic view, click Create subscription.
  4. Select Email and enter your address.

[!NOTE] Check your inbox and click Confirm Subscription in the email from AWS.

Step 2: Create a CloudWatch Log Group

Centralize your pipeline logs for monitoring.

bash
aws logs create-log-group --log-group-name /aws/vendedlogs/pipeline-monitor

Step 3: Simulate a Pipeline Failure

We will use a Lambda function to simulate a data processing task that intermittently logs errors.

bash
# Create an execution role for Lambda # (Simplified for lab purposes) aws iam create-role --role-name lambda-monitor-role --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "lambda.amazonaws.com"},"Action": "sts:AssumeRole"}]}' # Attach logging permissions aws iam attach-role-policy --role-name lambda-monitor-role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Create a file named lambda_function.py:

python
import logging logger = logging.getLogger() logger.setLevel(logging.INFO) def lambda_handler(event, context): print("START: Data processing task") # Simulate a failure logger.error("ERROR: Data validation failed for record ID 9921") return {"status": "complete"}
bash
# Zip and deploy zip function.zip lambda_function.py aws lambda create-function --function-name pipeline-worker \ --zip-file fileb://function.zip --handler lambda_function.lambda_handler \ --runtime python3.9 --role arn:aws:iam::<YOUR_ACCOUNT_ID>:role/lambda-monitor-role # Invoke to generate logs aws lambda invoke --function-name pipeline-worker out.txt

Step 4: Create a Metric Filter and Alarm

This step automates the detection of the word "ERROR" in your logs.

bash
# Create Metric Filter aws logs put-metric-filter \ --log-group-name /aws/lambda/pipeline-worker \ --filter-name ErrorFilter \ --filter-pattern "ERROR" \ --metric-transformations metricName=ErrorCount,metricNamespace=PipelineMonitor,metricValue=1 # Create Alarm aws cloudwatch put-metric-alarm \ --alarm-name PipelineErrorAlarm \ --metric-name ErrorCount \ --namespace PipelineMonitor \ --statistic Sum \ --period 60 \ --threshold 1 \ --comparison-operator GreaterThanOrEqualToThreshold \ --evaluation-periods 1 \ --alarm-actions arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts

Checkpoints

  1. SNS Confirmation: Do you have a green "Confirmed" status in the SNS Console?
  2. Log Discovery: Navigate to CloudWatch Logs > Log Groups > /aws/lambda/pipeline-worker. Can you see the "ERROR" message?
  3. Alarm State: In CloudWatch Alarms, is PipelineErrorAlarm in the OK or ALARM state? (Invoke the Lambda again if it stays in OK).

Troubleshooting

IssuePossible CauseFix
No email receivedSNS Subscription not confirmedCheck spam folder and click confirm link.
Alarm stays in 'INSUFFICIENT_DATA'No logs matched the filterInvoke the Lambda function 2-3 times to trigger the pattern.
Lambda fails to createRole not yet propagatedWait 10 seconds and retry the create-function command.

Clean-Up / Teardown

[!WARNING] Always delete lab resources to avoid unexpected AWS charges.

bash
# Delete Alarm aws cloudwatch delete-alarms --alarm-names PipelineErrorAlarm # Delete Log Group aws logs delete-log-group --log-group-name /aws/lambda/pipeline-worker # Delete Lambda aws lambda delete-function --function-name pipeline-worker # Delete SNS Topic aws sns delete-topic --topic-arn arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts # Delete IAM Role (Detach policy first) aws iam detach-role-policy --role-name lambda-monitor-role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole aws iam delete-role --role-name lambda-monitor-role

Cost Estimate

  • CloudWatch: First 5GB of logs and 10 alarms are free tier eligible. < $0.10 for this lab.
  • Lambda: First 1 million requests per month are free. $0.00 for this lab.
  • SNS: First 1,000 emails per month are free. $0.00 for this lab.

Stretch Challenge

Modify the Metric Filter to use a JSON filter pattern. If your Lambda logs in JSON format (e.g., {'status': 'ERROR', 'code': 500}), create a filter that only triggers an alarm if the code is 500.

Concept Review

Monitoring vs. Auditing

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds
ToolPrimary Use Case
CloudWatch LogsStoring and searching application-level logs.
CloudWatch AlarmsTriggering actions based on metric thresholds.
AWS CloudTrailAuditing API calls made by users or services.
Redshift System TablesTroubleshooting data load errors (e.g., STL_LOAD_ERRORS).

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free