Monitor and Analyze: AWS Advanced Networking Study Guide
Monitor and analyze network traffic to troubleshoot and optimize connectivity patterns
Monitor and Analyze Network Traffic: Troubleshooting & Optimization
This guide covers the critical skills needed to monitor, analyze, and optimize connectivity patterns within AWS environments, focusing on the ANS-C01 Specialty requirements.
Learning Objectives
After studying this guide, you should be able to:
- Configure and Analyze VPC Flow Logs to identify traffic patterns and security issues.
- Utilize CloudWatch Metrics and Alarms to establish performance baselines.
- Troubleshoot connectivity using Reachability Analyzer and Network Insights.
- Visualize complex network topologies using Transit Gateway Network Manager.
- Optimize throughput and latency by identifying network impairments like jitter and packet loss.
Key Terms & Glossary
- VPC Flow Logs: A feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC.
- Jitter: The variation in time delay between when a packet is transmitted and when it is received.
- Throughput: The actual amount of data transmitted over a network in a given time period (e.g., Gbps).
- Reachability Analyzer: A configuration analysis tool that enables you to perform static connectivity testing between resources.
- Traffic Mirroring: An Amazon VPC feature used to copy network traffic from an elastic network interface and send it to security/monitoring appliances for deep packet inspection.
The "Big Idea"
Network monitoring in AWS is not just about fixing what is broken; it is about moving from a reactive state (fixing outages) to a proactive state (optimizing for performance and cost). By integrating logging services like VPC Flow Logs with analytical tools like CloudWatch and Reachability Analyzer, engineers can maintain a "single pane of glass" view of global connectivity, ensuring that application performance remains consistent despite underlying network complexity.
Formula / Concept Box
| Concept | Metric / Rule | Impact of Poor Performance |
|---|---|---|
| Latency | Slower application response times. | |
| Throughput | $Data\ Size / Time | Buffering in video, slow file transfers. |
| MTU | 1500 (Standard) vs 9001 (Jumbo) | Fragmentation or "Black Hole" routing if mismatched. |
| Packet Loss | (Packets_{sent} - Packets_{recv}) / Packets_{sent}$ | Re-transmissions and reduced effective bandwidth. |
Hierarchical Outline
- Data Collection Services
- VPC Flow Logs: Captures metadata (Src/Dst IP, Port, Protocol, Action).
- CloudWatch Logs: Central repository for storing and searching log data.
- Route 53 Resolver Query Logs: Tracking DNS resolution patterns.
- Diagnostic & Analysis Tools
- Reachability Analyzer: Static path analysis (No packets sent).
- Network Insights: Deep flow analysis for security and connectivity.
- VPC Traffic Mirroring: Layer 2-level packet capture for deep inspection.
- Visualization & Management
- Transit Gateway Network Manager: Global topology mapping and health monitoring.
- CloudWatch Dashboards: Visualizing trends in latency, bytes in/out, and error counts.
- Optimization Techniques
- Global Accelerator: Moving traffic onto the AWS private fiber early.
- Enhanced Networking: Using ENA/EFA for higher PPS and lower jitter.
Visual Anchors
Troubleshooting Flowchart
VPC Traffic Mirroring Architecture
\begin{tikzpicture}[node distance=2cm, every node/.style={fill=white, font=\footnotesize}, scale=0.8] \draw[thick] (0,0) rectangle (10,5) node[pos=0.1] {VPC}; \node (src) [draw, rectangle, minimum width=1.5cm] at (2,3) {Source ENI}; \node (dest) [draw, rectangle, minimum width=1.5cm, fill=blue!10] at (8,3) {Target ENI}; \node (mirror) [draw, circle, inner sep=2pt, fill=orange!20] at (5,3) {Mirror Session}; \draw[->, thick] (src) -- (mirror) node[midway, above] {Copied Traffic}; \draw[->, thick] (mirror) -- (dest); \draw[->, dashed] (src) -- (2,0.5) node[below] {Actual Destination}; \node at (5,0.8) {Deep Packet Inspection (IDS/IPS)}; \end{tikzpicture}
Definition-Example Pairs
- Metric Baseline: A record of "normal" performance levels used for comparison during issues.
- Example: Recording that a VPN tunnel usually has 40ms latency so that an alarm can trigger if it hits 80ms.
- Reachability Constraints: Factors that prevent a packet from reaching its destination.
- Example: A missing route in a subnet route table or an explicit 'Deny' in a Network ACL.
- Packet Shaping: Managing traffic to ensure performance for high-priority applications.
- Example: Using VPC Traffic Mirroring to identify an application that is hogging bandwidth with non-critical backups.
Worked Examples
Scenario: The "Invisible" Block
Problem: An EC2 instance in Subnet A cannot connect to an RDS instance in Subnet B. All Security Groups seem correct.
Step-by-Step Resolution:
- Run Reachability Analyzer: Set source as EC2 ENI and destination as RDS ENI.
- Analyze Output: The tool indicates "Not Reachable" due to a NACL rule.
- Inspect NACLs: You find that the inbound NACL on Subnet B allows traffic, but the outbound ephemeral port range on Subnet A is blocked, preventing return traffic.
- Fix: Add an outbound rule to Subnet A's NACL for the return traffic range (1024-65535).
- Verify: Re-run Reachability Analyzer; status changes to "Reachable."
Checkpoint Questions
- What is the primary difference between a VPC Flow Log and VPC Traffic Mirroring?
- Which service would you use to see a global map of your Transit Gateway attachments across regions?
- If an application is experiencing intermittent slowdowns without complete disconnection, which metric should you check first: Latency or Reachability?
- How does AWS Global Accelerator improve connectivity patterns compared to the public internet?
Muddy Points & Cross-Refs
- Reachability Analyzer vs. Network Insights Access Analyzer: Reachability Analyzer is for connectivity between AWS resources. Access Analyzer is for identifying unintended public or cross-account access.
- MTU Mismatches: Often a "muddy point" because small packets (ping) work, but large packets (data transfer) fail. Always verify if Jumbo Frames (9001 MTU) are supported across the entire path (e.g., over VPN or Peering).
- Cross-Ref: Refer to Unit 1: Edge Networking for more on Global Accelerator and Chapter 5: Logging for CloudWatch agent details.
Comparison Tables
| Feature | VPC Flow Logs | VPC Traffic Mirroring | Reachability Analyzer |
|---|---|---|---|
| Layer | Layer 3/4 (Metadata) | Layer 2 (Full Packet) | Logic/Config Analysis |
| Storage | S3 or CloudWatch Logs | Sent to an ENI/NLB | Console/API Output |
| Use Case | Security Audits, Top Talkers | Deep Packet Inspection (DPI) | Troubleshooting Connectivity |
| Cost | Per GB of logs | Per hour per mirror | Per reachability analysis |