Lab: Monitoring and Auditing AWS Data Pipelines

This hands-on lab guides you through implementing a robust monitoring and alerting solution for a serverless data pipeline. You will learn to capture logs, create metric filters, and automate notifications when failures occur.

Prerequisites

An active AWS Account.
AWS CLI installed and configured with Administrator access.
Basic knowledge of Python and SQL.
Familiarity with the AWS Management Console.

[!IMPORTANT] Ensure your CLI is configured for a specific region (e.g., us-east-1) and use that region consistently throughout the lab.

Learning Objectives

Configure Amazon CloudWatch Logs to centralize pipeline execution data.
Create a CloudWatch Metric Filter to detect specific error patterns (e.g., "ERROR").
Set up Amazon SNS for automated real-time alerts.
Utilize CloudWatch Logs Insights to perform log analysis for auditing.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

You need a notification channel to receive alerts when your pipeline fails.

bash

# Create the SNS Topic
aws sns create-topic --name brainybee-pipeline-alerts

# Subscribe your email (Replace YOUR_EMAIL)
aws sns subscribe --topic-arn arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts --protocol email --notification-endpoint <YOUR_EMAIL>

▶Console alternative

Navigate to SNS > Topics > Create topic.
Name it brainybee-pipeline-alerts.
In the topic view, click Create subscription.
Select Email and enter your address.

[!NOTE] Check your inbox and click Confirm Subscription in the email from AWS.

Step 2: Create a CloudWatch Log Group

Centralize your pipeline logs for monitoring.

bash

aws logs create-log-group --log-group-name /aws/vendedlogs/pipeline-monitor

Step 3: Simulate a Pipeline Failure

We will use a Lambda function to simulate a data processing task that intermittently logs errors.

bash

# Create an execution role for Lambda
# (Simplified for lab purposes)
aws iam create-role --role-name lambda-monitor-role --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "lambda.amazonaws.com"},"Action": "sts:AssumeRole"}]}'

# Attach logging permissions
aws iam attach-role-policy --role-name lambda-monitor-role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Create a file named lambda_function.py:

python

import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    print("START: Data processing task")
    # Simulate a failure
    logger.error("ERROR: Data validation failed for record ID 9921")
    return {"status": "complete"}

bash

# Zip and deploy
zip function.zip lambda_function.py
aws lambda create-function --function-name pipeline-worker \
--zip-file fileb://function.zip --handler lambda_function.lambda_handler \
--runtime python3.9 --role arn:aws:iam::<YOUR_ACCOUNT_ID>:role/lambda-monitor-role

# Invoke to generate logs
aws lambda invoke --function-name pipeline-worker out.txt

Step 4: Create a Metric Filter and Alarm

This step automates the detection of the word "ERROR" in your logs.

bash

# Create Metric Filter
aws logs put-metric-filter \
--log-group-name /aws/lambda/pipeline-worker \
--filter-name ErrorFilter \
--filter-pattern "ERROR" \
--metric-transformations metricName=ErrorCount,metricNamespace=PipelineMonitor,metricValue=1

# Create Alarm
aws cloudwatch put-metric-alarm \
--alarm-name PipelineErrorAlarm \
--metric-name ErrorCount \
--namespace PipelineMonitor \
--statistic Sum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts

Checkpoints

SNS Confirmation: Do you have a green "Confirmed" status in the SNS Console?
Log Discovery: Navigate to CloudWatch Logs > Log Groups > /aws/lambda/pipeline-worker. Can you see the "ERROR" message?
Alarm State: In CloudWatch Alarms, is PipelineErrorAlarm in the OK or ALARM state? (Invoke the Lambda again if it stays in OK).

Troubleshooting

Issue	Possible Cause	Fix
No email received	SNS Subscription not confirmed	Check spam folder and click confirm link.
Alarm stays in 'INSUFFICIENT_DATA'	No logs matched the filter	Invoke the Lambda function 2-3 times to trigger the pattern.
Lambda fails to create	Role not yet propagated	Wait 10 seconds and retry the `create-function` command.

Clean-Up / Teardown

[!WARNING] Always delete lab resources to avoid unexpected AWS charges.

bash

# Delete Alarm
aws cloudwatch delete-alarms --alarm-names PipelineErrorAlarm

# Delete Log Group
aws logs delete-log-group --log-group-name /aws/lambda/pipeline-worker

# Delete Lambda
aws lambda delete-function --function-name pipeline-worker

# Delete SNS Topic
aws sns delete-topic --topic-arn arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts

# Delete IAM Role (Detach policy first)
aws iam detach-role-policy --role-name lambda-monitor-role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
aws iam delete-role --role-name lambda-monitor-role

Cost Estimate

CloudWatch: First 5GB of logs and 10 alarms are free tier eligible. < $0.10 for this lab.
Lambda: First 1 million requests per month are free. $0.00 for this lab.
SNS: First 1,000 emails per month are free. $0.00 for this lab.

Stretch Challenge

Modify the Metric Filter to use a JSON filter pattern. If your Lambda logs in JSON format (e.g., {'status': 'ERROR', 'code': 500}), create a filter that only triggers an alarm if the code is 500.

Concept Review

Monitoring vs. Auditing

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Tool	Primary Use Case
CloudWatch Logs	Storing and searching application-level logs.
CloudWatch Alarms	Triggering actions based on metric thresholds.
AWS CloudTrail	Auditing API calls made by users or services.
Redshift System Tables	Troubleshooting data load errors (e.g., `STL_LOAD_ERRORS`).

Lab: Monitoring and Auditing AWS Data Pipelines

Prerequisites

An active AWS Account.
AWS CLI installed and configured with Administrator access.
Basic knowledge of Python and SQL.
Familiarity with the AWS Management Console.

[!IMPORTANT] Ensure your CLI is configured for a specific region (e.g., us-east-1) and use that region consistently throughout the lab.

Learning Objectives

Configure Amazon CloudWatch Logs to centralize pipeline execution data.
Create a CloudWatch Metric Filter to detect specific error patterns (e.g., "ERROR").
Set up Amazon SNS for automated real-time alerts.
Utilize CloudWatch Logs Insights to perform log analysis for auditing.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

You need a notification channel to receive alerts when your pipeline fails.

bash

# Create the SNS Topic
aws sns create-topic --name brainybee-pipeline-alerts

# Subscribe your email (Replace YOUR_EMAIL)
aws sns subscribe --topic-arn arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts --protocol email --notification-endpoint <YOUR_EMAIL>

▶Console alternative

Navigate to SNS > Topics > Create topic.
Name it brainybee-pipeline-alerts.
In the topic view, click Create subscription.
Select Email and enter your address.

[!NOTE] Check your inbox and click Confirm Subscription in the email from AWS.

Step 2: Create a CloudWatch Log Group

Centralize your pipeline logs for monitoring.

bash

aws logs create-log-group --log-group-name /aws/vendedlogs/pipeline-monitor

Step 3: Simulate a Pipeline Failure

We will use a Lambda function to simulate a data processing task that intermittently logs errors.

bash

# Create an execution role for Lambda
# (Simplified for lab purposes)
aws iam create-role --role-name lambda-monitor-role --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "lambda.amazonaws.com"},"Action": "sts:AssumeRole"}]}'

# Attach logging permissions
aws iam attach-role-policy --role-name lambda-monitor-role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Create a file named lambda_function.py:

python

import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    print("START: Data processing task")
    # Simulate a failure
    logger.error("ERROR: Data validation failed for record ID 9921")
    return {"status": "complete"}

bash

# Zip and deploy
zip function.zip lambda_function.py
aws lambda create-function --function-name pipeline-worker \
--zip-file fileb://function.zip --handler lambda_function.lambda_handler \
--runtime python3.9 --role arn:aws:iam::<YOUR_ACCOUNT_ID>:role/lambda-monitor-role

# Invoke to generate logs
aws lambda invoke --function-name pipeline-worker out.txt

Step 4: Create a Metric Filter and Alarm

This step automates the detection of the word "ERROR" in your logs.

bash

# Create Metric Filter
aws logs put-metric-filter \
--log-group-name /aws/lambda/pipeline-worker \
--filter-name ErrorFilter \
--filter-pattern "ERROR" \
--metric-transformations metricName=ErrorCount,metricNamespace=PipelineMonitor,metricValue=1

# Create Alarm
aws cloudwatch put-metric-alarm \
--alarm-name PipelineErrorAlarm \
--metric-name ErrorCount \
--namespace PipelineMonitor \
--statistic Sum \
--period 60 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts

Checkpoints

SNS Confirmation: Do you have a green "Confirmed" status in the SNS Console?
Log Discovery: Navigate to CloudWatch Logs > Log Groups > /aws/lambda/pipeline-worker. Can you see the "ERROR" message?
Alarm State: In CloudWatch Alarms, is PipelineErrorAlarm in the OK or ALARM state? (Invoke the Lambda again if it stays in OK).

Troubleshooting

Issue	Possible Cause	Fix
No email received	SNS Subscription not confirmed	Check spam folder and click confirm link.
Alarm stays in 'INSUFFICIENT_DATA'	No logs matched the filter	Invoke the Lambda function 2-3 times to trigger the pattern.
Lambda fails to create	Role not yet propagated	Wait 10 seconds and retry the `create-function` command.

Clean-Up / Teardown

[!WARNING] Always delete lab resources to avoid unexpected AWS charges.

bash

# Delete Alarm
aws cloudwatch delete-alarms --alarm-names PipelineErrorAlarm

# Delete Log Group
aws logs delete-log-group --log-group-name /aws/lambda/pipeline-worker

# Delete Lambda
aws lambda delete-function --function-name pipeline-worker

# Delete SNS Topic
aws sns delete-topic --topic-arn arn:aws:sns:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:brainybee-pipeline-alerts

# Delete IAM Role (Detach policy first)
aws iam detach-role-policy --role-name lambda-monitor-role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
aws iam delete-role --role-name lambda-monitor-role

Cost Estimate

CloudWatch: First 5GB of logs and 10 alarms are free tier eligible. < $0.10 for this lab.
Lambda: First 1 million requests per month are free. $0.00 for this lab.
SNS: First 1,000 emails per month are free. $0.00 for this lab.

Stretch Challenge

Modify the Metric Filter to use a JSON filter pattern. If your Lambda logs in JSON format (e.g., {'status': 'ERROR', 'code': 500}), create a filter that only triggers an alarm if the code is 500.

Concept Review

Monitoring vs. Auditing

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Tool	Primary Use Case
CloudWatch Logs	Storing and searching application-level logs.
CloudWatch Alarms	Triggering actions based on metric thresholds.
AWS CloudTrail	Auditing API calls made by users or services.
Redshift System Tables	Troubleshooting data load errors (e.g., `STL_LOAD_ERRORS`).

Lab: Monitoring and Auditing AWS Data Pipelines

Lab: Monitoring and Auditing AWS Data Pipelines

Prerequisites

Learning Objectives

Architecture Overview

Step-by-Step Instructions

Step 1: Create an SNS Topic and Subscription

Step 2: Create a CloudWatch Log Group

Step 3: Simulate a Pipeline Failure

Step 4: Create a Metric Filter and Alarm

Checkpoints

Troubleshooting

Clean-Up / Teardown

Cost Estimate

Stretch Challenge

Concept Review

Monitoring vs. Auditing

Lab: Monitoring and Auditing AWS Data Pipelines

Lab: Monitoring and Auditing AWS Data Pipelines

Prerequisites

Learning Objectives

Architecture Overview

Step-by-Step Instructions

Step 1: Create an SNS Topic and Subscription

Step 2: Create a CloudWatch Log Group

Step 3: Simulate a Pipeline Failure

Step 4: Create a Metric Filter and Alarm

Checkpoints

Troubleshooting

Clean-Up / Teardown

Cost Estimate

Stretch Challenge

Concept Review

Monitoring vs. Auditing