AWS DVA-C02 Study Guide: Debugging and Identifying Defects
Debug code to identify defects
AWS DVA-C02 Study Guide: Debugging and Identifying Defects
This guide covers the essential skills for Unit 4 of the AWS Certified Developer - Associate (DVA-C02) curriculum, specifically focusing on identifying defects, performing root cause analysis, and utilizing observability tools in the AWS ecosystem.
Learning Objectives
After studying this guide, you should be able to:
- Identify and isolate defects within distributed application code.
- Interpret and correlate logs, metrics, and traces to perform root cause analysis.
- Query CloudWatch Logs effectively to find relevant error data.
- Troubleshoot integration issues between AWS services (e.g., Lambda to DynamoDB).
- Utilize Amazon Q Developer and AWS SAM for local and automated testing.
Key Terms & Glossary
- Observability: The ability to measure the internal state of a system by examining its external outputs (logs, metrics, and traces).
- Instrumentation: The process of adding code to an application to collect data for monitoring purposes (e.g., AWS X-Ray SDK).
- Structured Logging: Logging data in a machine-readable format (like JSON) to make querying and analysis easier.
- Root Cause Analysis (RCA): A systematic process for identifying the "root" cause of problems or events and an approach for responding to them.
- Annotation (X-Ray): Key-value pairs indexed for use with filter expressions in AWS X-Ray.
The "Big Idea"
In a local environment, debugging often involves stepping through code with a breakpoint. In the Cloud, debugging shifts from execution control to observability. Because applications are distributed across many services (Lambda, API Gateway, SQS), identifying a defect requires correlating data points across service boundaries. You don't just find the bug; you reconstruct the story of the failure using logs, metrics, and traces.
Formula / Concept Box
| Tool | Primary Purpose | Key Feature for Debugging |
|---|---|---|
| Amazon CloudWatch Logs | Storage and monitoring of log files | Logs Insights: Fast, interactive queries of log data. |
| Amazon CloudWatch Metrics | Numerical data over time | Alarms: Notify when thresholds (like 5XX errors) are met. |
| AWS X-Ray | End-to-end request tracing | Service Map: Visualizes bottlenecks and error origins. |
| CloudWatch EMF | High-cardinality custom metrics | Embedded Metric Format: Standard for injecting metrics into logs. |
Hierarchical Outline
- I. Root Cause Analysis (RCA) Fundamentals
- Identifying Defects: Using service output logs to catch syntax or runtime errors.
- Interpreting Traces: Using X-Ray to see where a request hangs or fails in a multi-service chain.
- II. Log Management & Querying
- CloudWatch Logs Insights: Using
filter,sort, andstatsto isolate failed requests. - Structured Logging: Implementing JSON-based logs for better searchability.
- CloudWatch Logs Insights: Using
- III. Service Integration Debugging
- Permissions: Identifying AccessDenied errors in logs (IAM issues).
- Networking: Debugging Lambda VPC connectivity issues (Security Groups/NACLs).
- Throttling: Identifying
ProvisionedThroughputExceededorRateExceedederrors.
- IV. Testing & Optimization
- Local Testing: Using AWS SAM to simulate Lambda and API Gateway locally.
- Automated Testing: Using Amazon Q Developer for test generation.
Visual Anchors
Debugging Workflow
X-Ray Trace Visualization
\begin{tikzpicture}[node distance=2.5cm, every node/.style={draw, rectangle, rounded corners, inner sep=5pt, text centered, minimum width=2.5cm}] \node (User) {User/Client}; \node (APIG) [right of=User] {API Gateway}; \node (Lambda) [right of=APIG] {Lambda}; \node (DB) [right of=Lambda] {DynamoDB};
\draw[->, thick] (User) -- node[above] {HTTP GET} (APIG);
\draw[->, thick] (APIG) -- node[above, color=red] {403 Error} (Lambda);
\draw[->, dashed] (Lambda) -- node[above] {IAM Fail} (DB);
\draw[red, thick] (APIG.south west) -- (APIG.north east);
\draw[red, thick] (APIG.north west) -- (APIG.south east);\end{tikzpicture}
Definition-Example Pairs
- Defect: A flaw in the code that causes it to behave unexpectedly.
- Example: A Lambda function that fails to parse a JSON payload because a required field is missing.
- Annotation: An indexed metadata field in a trace used for filtering.
- Example: Adding
Segment.addAnnotation("OrderID", "12345")so you can search X-Ray for all traces related to that specific order.
- Example: Adding
- Custom Metric: A metric defined by the user rather than AWS default services.
- Example: Measuring the time it takes for a specific third-party API call to return using CloudWatch EMF.
Worked Examples
Example 1: Debugging a Lambda Integration Failure
Scenario: An API Gateway endpoint returns a 502 Bad Gateway. The developer needs to identify if the issue is the integration or the code.
- Check CloudWatch Metrics: Look at the
5XXerror count for API Gateway. - Check Lambda Logs: If the Lambda didn't execute, the issue is likely IAM permissions or a trigger configuration.
- Inspect X-Ray: The trace shows the Lambda service was reached, but the "Initialization" phase timed out.
- Root Cause: The Lambda is in a VPC and is trying to reach the internet without a NAT Gateway.
- Solution: Add a NAT Gateway or move the Lambda to a private subnet with proper routing.
Example 2: Finding a Needle in the Haystack with Logs Insights
Scenario: A specific user reports an error, but the logs are generating millions of lines.
Query:
fields @timestamp, @message, @logStream
| filter @message like /User12345/
| filter @message like /Error/
| sort @timestamp desc
| limit 20This query filters the log group for only entries containing the specific UserID and the string "Error".
Checkpoint Questions
- What is the main difference between an X-Ray Annotation and an X-Ray Metadata?
- Which service would you use to visualize the dependencies and health of all components in a distributed application?
- How does the CloudWatch Embedded Metric Format (EMF) benefit application performance compared to standard
PutMetricDatacalls? - If a Lambda function works locally but fails in the cloud with a "Task timed out" error, what are the first three things you should check?
▶Click for Answers
- Annotations are indexed (searchable), while Metadata is not (used for additional data storage).
- AWS X-Ray Service Map.
- EMF is asynchronous; it logs metrics to stdout, which CloudWatch processes in the background, reducing the latency overhead of network calls to the CloudWatch API.
- (1) Lambda Timeout setting, (2) VPC/Network routing/NAT Gateway availability, (3) Downstream service latency.