AWS Network Performance Metrics & Reachability Study Guide
Network performance metrics and reachability constraints (for example, routing, packet size)
Network Performance Metrics and Reachability Constraints
This study guide focuses on the critical factors influencing network performance and connectivity within AWS environments, specifically covering latency, packet size, routing optimization, and the diagnostic tools available to AWS Advanced Networking specialists.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between key performance metrics: latency, jitter, packet loss, and throughput.
- Identify how packet size (MTU) impacts network performance and troubleshooting connectivity.
- Utilize AWS tools like Reachability Analyzer and VPC Flow Logs to diagnose reachability issues.
- Recommend optimization strategies such as Global Accelerator and Jumbo Frames for specific use cases.
Key Terms & Glossary
- MTU (Maximum Transmission Unit): The size of the largest protocol data unit that can be communicated in a single network layer transaction (default is 1500 bytes for Ethernet).
- Jitter: The variation in the delay of received packets, often caused by network congestion or route changes.
- Throughput: The actual amount of data transmitted over the network in a given time period (e.g., Gbps).
- Reachability: The ability of a source to send a packet to a destination through the existing routing and security infrastructure.
- BGP (Border Gateway Protocol): The standardized exterior gateway protocol designed to exchange routing and reachability information among autonomous systems.
The "Big Idea"
Network performance is not just about "speed" (throughput); it is a multi-dimensional balance between reachability (is the path open?) and efficiency (is the path optimal?). In AWS, performance is constrained by physical limits (speed of light/latency), configuration limits (MTU/routing), and service quotas. Success involves shifting traffic from the public internet to the AWS global backbone and ensuring that packet sizes are optimized for the underlying infrastructure.
Formula / Concept Box
| Concept | Details / Values |
|---|---|
| Standard Ethernet MTU | 1500 bytes |
| AWS Jumbo Frames | 9001 bytes (Supported within VPC, Direct Connect, and some peering) |
| Throughput Formula | |
| Packet Rate |
[!IMPORTANT] Increasing packet size (Jumbo Frames) reduces the CPU overhead for the same amount of data but can cause connectivity failures if any hop in the path does not support the larger size.
Hierarchical Outline
- Network Performance Metrics
- Latency: Time for data to travel source to destination.
- Packet Loss: Packets failing to reach destination due to congestion or errors.
- Jitter: Variance in delay; critical for real-time apps (VoIP/Streaming).
- Throughput: Volume of data over time.
- Reachability Constraints
- Routing Protocols: Determining the "best path" (Static vs. Dynamic/BGP).
- Packet Size (MTU): Mismatches lead to fragmentation or silent packet drops.
- AWS Diagnostic Tools
- VPC Flow Logs: IP traffic metadata (Src/Dst, Port, Accept/Reject).
- Reachability Analyzer: Static configuration analysis (no packets sent).
- VPC Traffic Mirroring: Deep packet inspection for troubleshooting.
- CloudWatch: Metrics like
NetworkIn,NetworkOut,NetworkPacketsIn.
Visual Anchors
Troubleshooting Flowchart
Latency and Reachability Components
\begin{tikzpicture} \draw[thick,->] (0,0) -- (10,0) node[anchor=north] {Time}; \draw[fill=blue!20] (0.5,0.5) rectangle (2,1.5) node[pos=.5] {Processing}; \draw[fill=green!20] (2,0.5) rectangle (5,1.5) node[pos=.5] {Propagation}; \draw[fill=red!20] (5,0.5) rectangle (6.5,1.5) node[pos=.5] {Queuing}; \draw[fill=orange!20] (6.5,0.5) rectangle (8,1.5) node[pos=.5] {Transmission}; \node at (4.25, 2) {Components of Network Latency}; \draw [decorate,decoration={brace,amplitude=10pt}] (0.5,1.6) -- (8,1.6) node [black,midway,yshift=0.6cm] {Total Latency (Delay)}; \end{tikzpicture}
Definition-Example Pairs
- Packet Shaping: The practice of regulating network data transfer to ensure a certain level of performance or Quality of Service (QoS).
- Example: An AWS Direct Connect gateway limiting outbound traffic to prevent saturating a lower-bandwidth on-premises link.
- Path MTU Discovery (PMTUD): A technique to determine the MTU size on the network path between two IP hosts so that IP fragmentation can be avoided.
- Example: A web server sending a large packet with the "Don't Fragment" (DF) bit set; if a router cannot handle it, it returns an ICMP "Destination Unreachable" message.
- Route Summarization: Consolidating multiple routes into a single advertisement to reduce routing table size.
- Example: Advertising
10.0.0.0/16via BGP instead of sixty-four10.0.x.0/24individual subnets.
- Example: Advertising
Worked Examples
Scenario: Troubleshooting a Packet Size Mismatch
Problem: A fleet of EC2 instances in VPC-A can ping instances in VPC-B (via Peering), but large file transfers (SCP/HTTP) hang indefinitely or fail after a few seconds.
Step-by-Step Breakdown:
- Analyze Ping: Ping uses small ICMP packets (usually ~64 bytes). Successful ping proves basic reachability and routing exist.
- Test Large Packets: Run
ping -s 1472 -M do <destination_ip>. The-M doflag ensures the DF bit is set. - Identify Failure: If the ping fails for large sizes but works for small ones, a MTU mismatch is present.
- Verification: Check if one VPC uses Jumbo Frames (MTU 9001) while the Peering connection or the destination instance only supports MTU 1500.
- Solution: Adjust the MTU on the source EC2 instances to 1500 or ensure the entire path (including VPN/Direct Connect) supports the larger frame size.
Checkpoint Questions
- Which AWS tool would you use to perform a "dry run" check of network paths without sending actual traffic? (Answer: Reachability Analyzer)
- If an application requires sub-millisecond latency for high-performance computing, which network interface should be used? (Answer: Elastic Fabric Adapter - EFA)
- What is the main benefit of AWS Global Accelerator for a globally distributed user base? (Answer: It routes traffic over the AWS global network rather than the public internet, reducing latency and jitter.)
- VPC Flow Logs show an
REJECTstatus for traffic that you believe should be allowed. Which two resources should you check first? (Answer: Security Groups and Network ACLs)
Muddy Points & Cross-Refs
- MTU vs. MSS: MTU is at Layer 3 (IP), while MSS (Maximum Segment Size) is at Layer 4 (TCP). They are related but distinct. Cross-ref: TCP/IP Fundamentals.
- Network Insights vs. Reachability Analyzer: Reachability Analyzer is for point-to-point connectivity; Network Insights (via Transit Gateway Network Manager) provides a broader topology view.
- Jumbo Frames on Direct Connect: Note that while AWS supports MTU 9001, your on-premises hardware and ISP must also support it across the entire path.
Comparison Tables
Monitoring Tool Comparison
| Tool | Primary Data Type | Best For | Level of Detail |
|---|---|---|---|
| VPC Flow Logs | Metadata (Src/Dst IP/Port) | Compliance, Security Audit | High (Logs every flow) |
| CloudWatch Metrics | Aggregate Counters (Bytes In/Out) | Baselining, Alarms | Medium (Summary) |
| Reachability Analyzer | Configuration Analysis | Connectivity Troubleshooting | N/A (Predictive) |
| Traffic Mirroring | Full Packet Payloads | Deep Packet Inspection, Security | Very High (Raw data) |
Network Performance Bottlenecks
| Issue | Primary Metric | Common Root Cause |
|---|---|---|
| Buffering/Lags | Latency | Physical distance or sub-optimal routing |
| Choppy Audio/Video | Jitter | Congestion on intermediate hops |
| Data Corruption | Packet Loss | Faulty hardware or link saturation |
| Slow File Transfer | Throughput | MTU mismatch or TCP Window sizing |