Study Guide895 words

Mastering MTU: Troubleshooting Packet Size Mismatches in AWS VPCs

Troubleshooting packet size mismatches in a VPC to restore network connectivity

Mastering MTU: Troubleshooting Packet Size Mismatches in AWS VPCs

Learning Objectives

After completing this guide, you should be able to:

  • Identify the symptoms of Path MTU Discovery (PMTUD) failure in a VPC environment.
  • Distinguish between standard Ethernet frames (1500 bytes) and Jumbo frames (9001 bytes).
  • Configure appropriate MTU settings for EC2 instances, VPNs, and Direct Connect.
  • Utilize AWS-native tools like VPC Flow Logs and Traffic Mirroring to diagnose packet-level issues.

Key Terms & Glossary

  • MTU (Maximum Transmission Unit): The size of the largest protocol data unit (PDU) that can be communicated in a single network layer transaction.
    • Example: Standard Ethernet MTU is 1500 bytes.
  • Jumbo Frames: Ethernet frames with more than 1500 bytes of payload, typically 9001 bytes in AWS.
  • PMTUD (Path MTU Discovery): A technique for determining the MTU size on the network path between two IP hosts to avoid IP fragmentation.
  • MSS (Maximum Segment Size): The largest amount of data, specified in bytes, that a computer or communications device can handle in a single, unfragmented piece.
  • ICMP Type 3 Code 4: The "Destination Unreachable; Fragmentation Needed and DF set" message essential for PMTUD to function.

The "Big Idea"

Network connectivity isn't just about "up or down"; it's about the capacity of the pipes. A common "silent killer" of network performance is the MTU mismatch. When a source sends a packet larger than a middle-hop can handle, and that middle-hop cannot signal the source to shrink its packets (often due to over-aggressive firewalls blocking ICMP), the connection simply hangs. This is known as a "Black Hole" connection—small packets (like a TCP handshake) pass through, but large data transfers fail.

Formula / Concept Box

Connection TypeMaximum MTUNotes
Internet Gateway1500 bytesJumbo frames are NOT supported over the internet.
Inter-VPC (Peering)9001 bytesSupported within the same region.
AWS VPN1436 bytesReduced due to IPsec encapsulation overhead.
Direct Connect1500 or 9190 bytes9190 supported for private/transit VIFs.
Transit Gateway8500 bytesMTU for traffic between VPCs and TGW.

Hierarchical Outline

  • I. Understanding MTU in AWS
    • Standard MTU (1500): Default for most internet-bound traffic.
    • Jumbo Frames (9001): Used for high-throughput, low-latency requirements (e.g., HPC, Big Data).
  • II. Troubleshooting Path MTU Discovery (PMTUD)
    • The Role of ICMP: Why blocking all ICMP breaks network connectivity.
    • The Don't Fragment (DF) Bit: How it forces PMTUD.
  • III. Diagnostic Tools
    • VPC Flow Logs: Checking for packet size and REJECT status.
    • VPC Traffic Mirroring: Capturing raw packets for Wireshark analysis.
    • Reachability Analyzer: Verifying the path and identifying blocking Security Groups.
  • IV. Resolution Strategies
    • MSS Clamping: Adjusting the TCP handshake to prevent large packets.
    • Security Group Rules: Allowing ICMP Type 3 Code 4.

Visual Anchors

PMTUD Failure Logic

Loading Diagram...

Packet Encapsulation Overhead

\begin{tikzpicture}[scale=0.8] \draw[fill=blue!10] (0,0) rectangle (8,1) node[pos=.5] {Original IP Packet (e.g., 1436 bytes)}; \draw[fill=red!20] (-2,0) rectangle (0,1) node[pos=.5] {ESP/IPsec}; \draw[fill=green!20] (-4,0) rectangle (-2,1) node[pos=.5] {New IP Header}; \draw[<->, thick] (-4,-0.5) -- (8,-0.5) node[midway, below] {Total Frame Size must fit MTU (1500)}; \node at (2, 2) {\textbf{VPN Encapsulation adds ~64 bytes of overhead}}; \end{tikzpicture}

Definition-Example Pairs

  • MSS Clamping: A technique used by routers to alter the Maximum Segment Size in the TCP SYN packet.
    • Example: A VPN concentrator reduces the MSS of incoming SYN packets to 1396 bytes so that the resulting 1436-byte IP packet fits perfectly inside a 1500-byte MTU after adding encryption headers.
  • Black Hole Router: A router that drops packets exceeding the MTU without sending an ICMP response.
    • Example: A Security Group that blocks all inbound ICMP traffic effectively makes the associated EC2 instance or gateway a Black Hole router for PMTUD.

Worked Examples

Scenario: The "Hanging" SSH Connection

Issue: A user can connect to an EC2 instance via SSH, but when they run a command that produces a lot of output (like cat large_file.txt), the session freezes.

Step-by-Step Diagnosis:

  1. Test Ping with Size: Run ping -s 1472 -M do <IP>. (1472 bytes + 28 bytes header = 1500).
  2. Observe Result: If the ping fails but a standard ping <IP> works, an MTU bottleneck exists.
  3. Check Flow Logs: Analyze VPC Flow Logs for the pkt-src-size and pkt-dst-size fields. Look for truncated packets.
  4. Analyze Security Groups: Ensure the inbound rules allow ICMP Type 3, Code 4 from 0.0.0.0/0.
  5. Solution: Enable Jumbo frames only within the VPC; ensure internet-bound traffic is capped at 1500 bytes via MSS clamping on the VPN/Router.

Checkpoint Questions

  1. What is the default MTU for an AWS Site-to-Site VPN connection?
  2. Why does blocking all ICMP traffic cause "Black Hole" connectivity issues?
  3. Which AWS tool allows you to see the actual bytes of a packet to verify if headers are being stripped?
  4. Can you use Jumbo Frames (9001 bytes) to send data from a VPC to an on-premises server over the public internet?

Muddy Points & Cross-Refs

  • MSS vs. MTU: Many students confuse the two. Remember: MTU is the limit for the IP Layer (L3), while MSS is the limit for the TCP Data (L4).
  • Where to change MTU: You can change MTU on the OS level (e.g., ifconfig eth0 mtu 1500), but it must match the path's capabilities.
  • Cross-Ref: See Unit 4: Network Security for details on how NACLs (stateless) can inadvertently block ICMP responses needed for PMTUD.

Comparison Tables

MTU Support Comparison

FeatureStandard FrameJumbo Frame
Byte Size15009001
EfficiencyLower (Higher overhead %)Higher (More data per header)
Use CaseInternet, VPNs, GeneralDatabase clusters, HPC, Storage
FragmentationLess likelyCommon if misconfigured
CompatibilityUniversalWithin VPC / Direct Connect only

Ready to study AWS Certified Advanced Networking - Specialty (ANS-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free