Hands-On Lab920 words

Root Cause Analysis Mastery: Debugging Serverless Applications on AWS

Assist in a root cause analysis

Root Cause Analysis Mastery: Debugging Serverless Applications on AWS

This lab focuses on the critical DVA-C02 skill of Assisting in a Root Cause Analysis (RCA). You will act as a developer troubleshooting a failing serverless data pipeline. You'll move from identifying a failure in logs to tracing the execution path in X-Ray and eventually fixing the defect.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges to your AWS account.

Prerequisites

Before starting, ensure you have:

  • An AWS Account with administrative access.
  • AWS CLI configured with credentials (aws configure).
  • Basic familiarity with Python and JSON.
  • IAM permissions to create Lambda, S3, DynamoDB, and CloudWatch resources.

Learning Objectives

By the end of this lab, you will be able to:

  1. Query CloudWatch Logs using Log Insights to find specific application errors.
  2. Analyze X-Ray Traces to identify service integration bottlenecks and failures.
  3. Implement Custom Metrics using the CloudWatch Embedded Metric Format (EMF).
  4. Perform an RCA by correlating logs, traces, and metrics to identify a code defect.

Architecture Overview

We are troubleshooting a "Thumbnail Processor" application. An image is uploaded to S3, triggering a Lambda function that logs metadata to DynamoDB. Currently, the metadata step is failing.

Loading Diagram...

Step-by-Step Instructions

Step 1: Deploy the Faulty Infrastructure

We will deploy a CloudFormation stack that intentionally contains a bug in the Lambda code's integration with DynamoDB.

bash
aws cloudformation deploy \ --stack-name brainybee-rca-lab \ --template-body https://raw.githubusercontent.com/aws-samples/aws-serverless-workshops/master/Observability/template.yaml \ --capabilities CAPABILITY_IAM
Console alternative

Navigate to CloudFormation > Create stack > With new resources. Upload the template URL and follow the wizard to name the stack

brainybee-rca-lab

.

Step 2: Trigger the Failure

Upload a test file to the newly created S3 bucket to trigger the Lambda function.

bash
# Get the bucket name from stack outputs BUCKET_NAME=$(aws cloudformation describe-stacks --stack-name brainybee-rca-lab --query "Stacks[0].Outputs[?OutputKey=='BucketName'].OutputValue" --output text) echo "test data" > test-image.jpg aws s3 cp test-image.jpg s3://$BUCKET_NAME/test-image.jpg

Step 3: Query Logs with CloudWatch Insights

The Lambda failed silently. We need to find the specific error message.

bash
# Start a query for the last 5 minutes QUERY_ID=$(aws logs start-query \ --log-group-name /aws/lambda/ThumbnailFunction \ --start-time $(date +%s -d "5 minutes ago") \ --end-time $(date +%s) \ --query-string 'fields @timestamp, @message | filter @message like /Error/ | sort @timestamp desc' \ --query 'queryId' --output text) # Wait 5 seconds, then get results aws logs get-query-results --query-id $QUERY_ID

[!TIP] In the CloudWatch Console, go to Logs Insights, select the log group, and run: fields @timestamp, @message | filter @message like /Error/

Step 4: Analyze the Trace in X-Ray

Logs show a ResourceNotFoundException. We need to see which service integration is actually failing.

bash
# Get trace summaries for the last minute aws xray get-trace-summaries --start-time $(date +%s -d "1 minute ago") --end-time $(date +%s)
Console alternative

Navigate to CloudWatch > X-Ray traces > Service map. You will see a red circle around the connection between Lambda and DynamoDB, indicating a 400-series error.

Checkpoints

  • S3 Upload: Is the file visible in aws s3 ls s3://$BUCKET_NAME?
  • Logs: Did the Log Insights query return an error string like Requested resource not found (Table: MetadataTable)?
  • Traces: Does the X-Ray Service Map show a failed node for DynamoDB?

Teardown

To avoid costs, delete the resources created during this lab.

bash
# Empty the bucket first aws s3 rm s3://$BUCKET_NAME --recursive # Delete the stack aws cloudformation delete-stack --stack-name brainybee-rca-lab

Troubleshooting

IssuePossible CauseFix
AccessDenied on S3 uploadIAM permissions missingEnsure your CLI user has s3:PutObject for the bucket.
Query returns 0 resultsLambda hasn't logged yetWait 30 seconds for CloudWatch to ingest the logs and retry.
Stack deletion hangsS3 bucket not emptyManually delete files in the S3 bucket before deleting the stack.

Challenge

Goal: Implement an automated monitor.

  1. Create a CloudWatch Metric Filter that looks for the word "Error" in the /aws/lambda/ThumbnailFunction log group.
  2. Assign this filter to a custom metric named ProcessingErrors.
  3. Create a CloudWatch Alarm that sends an SNS notification if ProcessingErrors > 0 for a 1-minute period.

Cost Estimate

  • Lambda: Free tier (first 1M requests/mo).
  • S3: $0.023 per GB (negligible for this lab).
  • CloudWatch Logs: $0.50 per GB ingested.
  • X-Ray: Free tier (first 100,000 traces/mo).
  • Total Estimated Cost: < $0.05 (well within AWS Free Tier).

Concept Review

In this lab, we performed a standard Root Cause Analysis (RCA) flow. This flow typically narrows down from a broad symptom to a specific code or configuration defect.

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Key Comparisons

ToolPrimary PurposeBest Used For...
CloudWatch LogsDiscrete event recordingFinding stack traces and specific error messages.
CloudWatch MetricsNumerical aggregationDetecting trends and triggering automated alarms.
AWS X-RayDistributed request tracingIdentifying high latency or failures in multi-service calls.
Log InsightsLog queryingSifting through thousands of log lines using a SQL-like syntax.

Ready to study AWS Certified Developer - Associate (DVA-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free