Mastering AWS Network Monitoring and Logging
Network monitoring and logging services that are available in AWS (for example, CloudWatch, AWS CloudTrail, VPC Traffic Mirroring, VPC Flow Logs, Transit Gateway Network Manager)
Mastering AWS Network Monitoring and Logging
This study guide covers the essential services and strategies for maintaining visibility, security, and performance across AWS and hybrid network architectures, specifically tailored for the AWS Certified Advanced Networking - Specialty (ANS-C01) exam.
Learning Objectives
After studying this chapter, you should be able to:
- Differentiate between metadata-level logging (VPC Flow Logs) and packet-level capture (Traffic Mirroring).
- Identify the appropriate tool for auditing API activity vs. monitoring resource performance.
- Design a log delivery and analysis pipeline using Kinesis, Athena, and OpenSearch.
- Utilize Transit Gateway Network Manager and Reachability Analyzer to troubleshoot global and local connectivity issues.
Key Terms & Glossary
- VPC Flow Logs: A feature that enables you to capture information about the IP traffic moving to and from network interfaces in your VPC.
- CloudTrail: An AWS service that records API calls made by or on behalf of an AWS account for auditing and compliance.
- CloudWatch Logs: A centralized service for storing, monitoring, and accessing log files from various AWS resources.
- Traffic Mirroring: A feature that allows you to copy network traffic from an ENI and send it to out-of-band security and monitoring appliances.
- Reachability Analyzer: A configuration analysis tool that enables you to perform static connectivity testing between source and destination resources.
The "Big Idea"
In the AWS ecosystem, observability is not a single tool but a multi-layered strategy. You cannot manage what you cannot measure. Network monitoring moves from high-level API auditing (CloudTrail) to resource-level metrics (CloudWatch), down to the flow metadata (VPC Flow Logs), and finally into the deep inspection of raw packets (Traffic Mirroring). Mastering these layers allows you to transition from reactive troubleshooting to proactive security and performance optimization.
Formula / Concept Box
| Concept | Primary Goal | Data Granularity |
|---|---|---|
| CloudTrail | Auditing "Who did what?" | Management/Control Plane |
| CloudWatch | Performance metrics & health | Service/Resource Level |
| VPC Flow Logs | Traffic metadata (IP/Port/Proto) | Network Layer (L3/L4) |
| Traffic Mirroring | Deep Packet Inspection (DPI) | Application Layer (L7) |
| Reachability Analyzer | Connectivity Path Validation | Logical Configuration |
Hierarchical Outline
- Auditing and Governance
- AWS CloudTrail: Captures API history. Crucial for identifying unauthorized configuration changes or compromised credentials.
- Resource Performance & Health
- Amazon CloudWatch: Collects Metrics (numbers) and Logs (text). Includes Alarms to trigger automated remediation (e.g., SNS alerts).
- Network Traffic Visibility
- VPC Flow Logs: Captures metadata. Use Base Fields (default) or Extended Fields (TCP flags, packet/byte counts) for deeper analysis.
- VPC Traffic Mirroring: Duplicates raw traffic from a source ENI to a target (EC2 or NLB). Used for IDS/IPS and forensic analysis.
- Specialized Network Tools
- Transit Gateway Network Manager: Provides a central dashboard for global networks across AWS and on-premises sites.
- VPC Reachability Analyzer: Tests if a path exists between two points without sending actual traffic packets.
Visual Anchors
Log Data Flow & Analysis Pipeline
Traffic Mirroring Architecture
Definition-Example Pairs
- CloudWatch Alarm
- Definition: A mechanism that watches a single metric over a specified time period and performs one or more actions based on the value of the metric relative to a threshold.
- Example: Automatically sending a notification to an administrator if the
NetworkInmetric for an EC2 instance exceeds 1 Gbps for more than 5 minutes.
- VPC Flow Log (REJECT)
- Definition: A log entry indicating that traffic was blocked by either a Security Group (stateful) or a Network ACL (stateless).
- Example: Seeing a
REJECTentry in logs for port 22 (SSH) from an unknown IP address, indicating the Security Group successfully blocked a potential brute-force attack.
Worked Examples
Scenario: Troubleshooting "Connection Refused"
Problem: A web server in a private subnet cannot reach an external database.
- Step 1: Reachability Analyzer: Run a test from the Web Server ENI to the Database endpoint. If it fails, it will identify the specific Security Group or Route Table entry that is blocking traffic.
- Step 2: VPC Flow Logs: Query the logs using CloudWatch Logs Insights.
If action issql
filter srcAddr = '10.0.1.5' and dstAddr = '203.0.113.10' | stats count(*) by actionREJECT, a security rule is the culprit. IfACCEPTbut no reply is seen, the return path or the database itself is the issue. - Step 3: CloudTrail: Check if any recent changes were made to the Security Groups by searching for
ModifySecurityGroupRulesevents.
Checkpoint Questions
- Which service provides a graphical map of your global network and monitors the health of your site-to-site VPNs?
- To perform a full packet capture for a forensic audit using Wireshark, which AWS feature would you enable?
- What is the difference between "Base Fields" and "Extended Fields" in VPC Flow Logs?
- True or False: Reachability Analyzer sends actual packets between resources to test connectivity.
▶Click to see answers
- Transit Gateway Network Manager.
- VPC Traffic Mirroring.
- Base fields contain standard info (IP, protocol, port); Extended fields include TCP flags, packets, bytes, and start/end timestamps.
- False. It performs a logical analysis of the configuration without generating traffic.
Muddy Points & Cross-Refs
- Flow Logs vs. Traffic Mirroring: Students often confuse these. Remember: Flow Logs are the "phone bill" (who called whom, for how long); Traffic Mirroring is the "wiretap" (everything that was said during the call).
- Security Groups vs. NACLs in Flow Logs: Because Security Groups are stateful, a
REJECTin one direction often implies a rule mismatch. NACLs are stateless and require explicit rules for both directions. - Latency Monitoring: CloudWatch provides standard metrics, but for granular network latency, you might need to use the CloudWatch Agent or Enhanced Networking metrics.
Comparison Tables
Monitoring vs. Troubleshooting Tools
| Tool | Best Use Case | Primary Output |
|---|---|---|
| VPC Flow Logs | High-level traffic patterns, Security Group auditing | Metadata (ASCII/JSON) |
| Traffic Mirroring | Deep Packet Inspection, IDS/IPS | Raw L2/L3 Packets |
| Reachability Analyzer | Debugging "No Route to Host" errors | Logical Path Result |
| Transit Gateway Network Manager | Global dashboard for Hybrid Cloud | Topology Map & Health Metrics |