Mastering AWS Log Querying: Insights and Troubleshooting for DVA-C02
Query logs to find relevant data
Mastering AWS Log Querying: Insights and Troubleshooting
This study guide focuses on the critical skill of querying logs to find relevant data, a key component of Domain 4 (Troubleshooting and Optimization) for the AWS Certified Developer - Associate (DVA-C02) exam.
Learning Objectives
- Differentiate between CloudWatch Log Groups, Streams, and Events.
- Identify specific traffic patterns using VPC Flow Logs and filter patterns.
- Construct complex queries using CloudWatch Logs Insights syntax.
- Extract actionable data from structured and unstructured log formats to perform Root Cause Analysis (RCA).
Key Terms & Glossary
- Log Event: The smallest unit of data, containing a timestamp and the raw message.
- Log Stream: A sequence of log events that share the same source (e.g., a specific EC2 instance or Lambda execution environment).
- Log Group: A logical container for log streams that share the same retention, monitoring, and access control settings.
- CloudWatch Logs Insights: A fully managed query engine that uses a specialized syntax to search and analyze log data.
- VPC Flow Logs: A feature that enables you to capture information about the IP traffic to and from network interfaces in your VPC.
- Filter Pattern: A specific syntax used to create metric filters or to search log data in real-time.
The "Big Idea"
Modern cloud debugging has shifted from Logging (simply recording events) to Observability (understanding internal state from external outputs). Querying logs is the bridge between these two. Instead of manually scrolling through thousands of text lines, developers use structured queries to isolate the specific "needle in the haystack"—such as a specific requestId that failed across multiple services.
Formula / Concept Box
CloudWatch Logs Insights Query Syntax
| Command | Purpose | Example |
|---|---|---|
fields | Selects specific fields to display | fields @timestamp, @message, status |
filter | Applies boolean conditions to narrow results | filter status = "500" and @message like /Error/ |
stats | Performs aggregations (count, sum, avg) | stats count(*) by bin(1h) |
sort | Orders the results | sort @timestamp desc |
limit | Controls the number of returned rows | limit 20 |
parse | Extracts ephemeral fields from raw text | parse @message "user * id *" as user, id |
Hierarchical Outline
- Log Infrastructure Hierarchy
- CloudWatch Logs as the central repository.
- Retention Policies: Setting expiration dates to manage costs.
- Querying Mechanics
- Filter Patterns: Used for Metric Filters and simple searches (e.g.,
[w1, w2="*Error*", w3]). - CloudWatch Logs Insights: Powerful interactive querying for logs in JSON or plain text.
- Filter Patterns: Used for Metric Filters and simple searches (e.g.,
- VPC Flow Logs Analysis
- Fields:
source-address,destination-address,protocol,action(ACCEPT/REJECT). - Use Cases: Troubleshooting Security Groups and Network ACLs.
- Fields:
- Application Observability
- Structured Logging: Using JSON format to make logs automatically searchable.
- EMF (Embedded Metric Format): Turning logs into CloudWatch Metrics automatically.
Visual Anchors
Log Data Flow
Log Hierarchy Model
Definition-Example Pairs
- Term: Filter Pattern
- Definition: A symbolic language used to identify terms in log events.
- Example:
{($.statusCode = 404) || ($.statusCode = 500)}used to catch client and server errors in JSON logs.
- Term: VPC Flow Record
- Definition: A line of text representing a specific network flow.
- Example:
2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 REJECT OK— This shows an SSH (port 22) connection being REJECTED.
Worked Examples
Example 1: Finding Slow Lambda Invocations
You need to find the top 10 slowest Lambda executions in the last hour to optimize performance.
Query:
filter @type = "REPORT"
| fields @requestId, @duration, @maxMemoryUsed
| sort @duration desc
| limit 10Explanation: CloudWatch automatically generates REPORT lines for Lambda. We filter for those lines, select the duration, and sort by highest value.
Example 2: Analyzing Security Group Drops
A developer reports that their app cannot connect to a database. You check VPC Flow Logs.
Query:
filter action = "REJECT"
| stats count(*) as rejectCount by srcAddr, dstAddr, dstPort
| sort rejectCount descExplanation: This identifies which source IP and destination port are being blocked most frequently, pointing directly to a misconfigured Security Group.
Checkpoint Questions
- What is the difference between a Log Group and a Log Stream?
- Answer: A Log Group is a collection of streams that share settings; a Log Stream is a specific sequence of events from a single source instance.
- Which Logs Insights command is used to calculate the average response time per hour?
- Answer:
stats avg(duration) by bin(1h).
- Answer:
- True or False: VPC Flow Logs capture the actual content (payload) of the data packets.
- Answer: False. They only capture metadata (IPs, ports, protocols, byte counts).
- In a CloudWatch Filter Pattern, what does the symbol
?represent?- Answer: It acts as a logical OR when searching for multiple terms (e.g.,
?"Error" ?"Fail").
- Answer: It acts as a logical OR when searching for multiple terms (e.g.,