Network Visibility & Performance Metrics Study Guide
Recommending appropriate metrics to provide visibility of the network status
Network Visibility & Performance Metrics
This guide covers the critical metrics and tools required to maintain visibility, troubleshoot connectivity, and optimize network performance within the AWS Advanced Networking Specialty (ANS-C01) scope.
Learning Objectives
After studying this guide, you will be able to:
- Identify appropriate metrics (Latency, Throughput, Packet Loss, Jitter) for network health assessment.
- Select the correct AWS tool (VPC Flow Logs, Traffic Mirroring, CloudWatch) for specific visibility requirements.
- Analyze routing patterns and verify connectivity intent using Reachability Analyzer.
- Differentiate between metadata-level logging and deep packet inspection.
- Establish a performance baseline for hybrid and cloud-native architectures.
Key Terms & Glossary
- VPC Flow Logs: A feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC.
- Traffic Mirroring: An Amazon VPC feature that you can use to copy network traffic from an elastic network interface (ENI) and send it to out-of-band security and monitoring appliances.
- Reachability Analyzer: A configuration analysis tool that enables you to perform static connectivity testing between a source resource and a destination resource in your VPC.
- Jitter: The variation in the delay of received packets, which can negatively impact real-time applications like VOIP or video streaming.
- MTU (Maximum Transmission Unit): The size of the largest protocol data unit that can be communicated in a single network layer transaction (standard is 1500 bytes; Jumbo frames are 9001 bytes).
The "Big Idea"
Network visibility is not just about "is it up?" but "how well is it performing?" In complex AWS environments, visibility requires a layered approach: CloudWatch for high-level health, VPC Flow Logs for connection metadata (who, what, when), and Traffic Mirroring for deep forensic analysis (the actual payload). Together, these tools allow a network engineer to transition from reactive firefighting to proactive optimization.
Formula / Concept Box
| Concept | Metric / Rule | Purpose |
|---|---|---|
| Throughput | Measuring bandwidth utilization and capacity. | |
| Latency (RTT) | Measuring round-trip time for performance tuning. | |
| Packet Loss | Identifying congestion or hardware issues. | |
| Jitter | $ | D_{i} - D_{i-1} |
Hierarchical Outline
- Core Performance Metrics
- Latency: Critical for user experience; affected by distance and routing hops.
- Packet Loss: Often caused by congested buffers or MTU mismatches.
- Throughput: Impacted by instance type limits and ENA/EFA capabilities.
- AWS Visibility Toolset
- CloudWatch: Centralized metrics (NetworkIn/Out), logs, and alarms.
- VPC Flow Logs: Captures source/dest IP, port, protocol, and action (ACCEPT/REJECT).
- Traffic Mirroring: Full packet capture for deep packet inspection (DPI).
- Verification & Troubleshooting Tools
- Reachability Analyzer: Checks routing/ACL/Security Group logic without sending packets.
- Transit Gateway Network Manager: Provides global topology views and cross-region health.
- Route 53 Query Logging: Monitors DNS resolution patterns and failures.
Visual Anchors
Network Visibility Flow
Visualizing Latency vs. Jitter
\begin{tikzpicture}[scale=1.2] % Latency Graph \draw[->] (0,0) -- (5,0) node[right] {Time}; \draw[->] (0,0) -- (0,2) node[above] {Packet Arrival}; \draw[blue, thick] (1,0) -- (1,1.5); \draw[blue, thick] (2,0) -- (2,1.5); \draw[blue, thick] (3,0) -- (3,1.5); \node at (2,-0.5) {Low Jitter (Consistent Delay)};
% Jitter Graph
\begin{scope}[xshift=6cm]
\draw[->] (0,0) -- (5,0) node[right] {Time};
\draw[->] (0,0) -- (0,2) node[above] {Packet Arrival};
\draw[red, thick] (0.5,0) -- (0.5,1.5);
\draw[red, thick] (2.5,0) -- (2.5,1.5);
\draw[red, thick] (3.0,0) -- (3.0,1.5);
\node at (2,-0.5) {High Jitter (Inconsistent Delay)};
\end{scope}\end{tikzpicture}
Definition-Example Pairs
- Metric: Packet Loss
- Definition: The percentage of packets that fail to reach their destination.
- Example: A VPN tunnel showing 10% packet loss during peak hours likely indicates an ISP congestion issue or a MTU/MSS clamping problem.
- Tool: Reachability Analyzer
- Definition: A static analysis tool that tests reachability between two points in a VPC.
- Example: You cannot ping an EC2 instance. Reachability Analyzer identifies that the NACL (Network ACL) is blocking outbound traffic on port 443.
- Log: Access Logs
- Definition: Detailed logs of requests made to load balancers or CloudFront.
- Example: Using ALV Access Logs to identify the specific client IP addresses causing a surge in 5XX error responses.
Worked Examples
Example 1: Troubleshooting "Connection Refused"
Scenario: An application on Instance A cannot connect to a database on Instance B.
- Step 1: Check VPC Flow Logs for the relevant ENIs.
- Step 2: Filter for
REJECT. If found, check the Security Group or NACL rules. - Step 3: If Flow Logs show
ACCEPTbut the connection fails, use Reachability Analyzer to verify the path through Route Tables and Gateways.
Example 2: Analyzing Packet Shaping Issues
Scenario: Video streaming performance is degraded despite high bandwidth.
- Step 1: Enable VPC Traffic Mirroring on the source ENI.
- Step 2: Route the mirrored traffic to a Wireshark-equipped EC2 instance.
- Step 3: Inspect for TCP Retransmissions or Out-of-Order packets that indicate packet shaping or path instability.
Checkpoint Questions
- What is the main difference between VPC Flow Logs and Traffic Mirroring in terms of data captured?
- Which tool would you use to visualize the global topology of your Transit Gateway network?
- If an instance has a high
NetworkPacketsOutmetric but low throughput, what might this indicate about the packet size? - How does CloudWatch help in establishing a network performance baseline?
Muddy Points & Cross-Refs
- Flow Logs vs. Mirroring: Flow logs are "after-the-fact" metadata (cheap, long retention). Mirroring is "real-time" packet capture (expensive, high overhead). Always start with Flow Logs.
- Reachability Analyzer vs. Route Analyzer: Reachability Analyzer is for VPC resources (SG/NACL/Route Tables). Transit Gateway Route Analyzer specifically tests routes across the TGW.
- MTU Mismatches: Often occur in hybrid VPN setups where the tunnel overhead reduces the effective MTU below 1500 bytes. Look for "ICMP Destination Unreachable" packets.
Comparison Tables
| Feature | VPC Flow Logs | Traffic Mirroring | CloudWatch Metrics |
|---|---|---|---|
| Data Type | IP Metadata (L3/L4) | Full Raw Packet (L2-L7) | Aggregated Statistics |
| Real-time? | No (5-10 min delay) | Yes (Streamed) | Near Real-time |
| Cost | Low (Per GB processed) | High (Per hour/ENI) | Included / Low |
| Use Case | Security Auditing | Malware/Deep Analysis | Operational Dashboarding |