Mastering Network Connectivity Troubleshooting: VPC Reachability Analyzer & Network Manager
Troubleshooting connectivity issues that are caused by network misconfiguration (for example, Reachability Analyzer)
Mastering Network Connectivity Troubleshooting: VPC Reachability Analyzer & Network Manager
This guide covers the essential tools and methodologies for diagnosing and resolving network misconfigurations within AWS, specifically focusing on the VPC Reachability Analyzer and Transit Gateway Network Manager.
Learning Objectives
By the end of this study guide, you will be able to:
- Configure and execute VPC Reachability Analyzer tests to identify connectivity bottlenecks.
- Differentiate between logical path analysis and data plane testing.
- Identify common misconfigurations in Security Groups, NACLs, and Route Tables using tool outputs.
- Automate connectivity verification to maintain a "known good" network state.
- Utilize Transit Gateway Network Manager for global topology visualization and monitoring.
Key Terms & Glossary
- Logical Model Analysis: The process used by Reachability Analyzer to check connectivity in code/configuration rather than sending actual packets.
- Hop-by-Hop Analysis: A detailed breakdown of every networking component (gateways, interfaces, rules) a packet encounters between source and destination.
- Connectivity Intent: The desired state of the network (e.g., "Subnet A should always be able to reach Subnet B").
- SD-WAN (Software-Defined Wide Area Network): Integrated with Transit Gateway Network Manager to manage branch office connectivity.
- ENI (Elastic Network Interface): Often the source or destination point for reachability tests.
The "Big Idea"
In modern cloud architectures, networking issues are rarely caused by physical cable failures and almost always caused by configuration drift. The "Big Idea" is to move away from reactive "trial and error" pinging and toward proactive, automated verification. By using logical models (Reachability Analyzer), engineers can prove connectivity exists without even spinning up an application, treating network reachability as a verifiable piece of code.
Formula / Concept Box
| Input Parameter | Description | Required? |
|---|---|---|
| Source | The starting point (e.g., Instance ID, ENI ID, VPN Gateway). | Yes |
| Destination | The end point (e.g., IP Address, Instance ID, VPC Endpoint). | Yes |
| Protocol | TCP or UDP. | Yes |
| Destination Port | The specific port traffic is targeting (e.g., 80, 443). | Optional |
[!NOTE] Reachability Analyzer does not send real traffic. It performs a static analysis of your configuration metadata.
Hierarchical Outline
- VPC Reachability Analyzer
- Core Functionality: Troubleshooting, verifying, and automating.
- Analysis Mechanism: Static logical model + Traceroute/Probing simulation.
- Key Identifiers: Detects blocking Security Groups, missing Routes, or NACL Deny rules.
- Automation & Ongoing Monitoring
- Connectivity Intent: Scripting tests to run after CloudFormation/CDK deployments.
- Verification: Ensuring changes didn't break existing paths.
- Transit Gateway Network Manager
- Global Visibility: Single dashboard for hybrid and multi-region setups.
- Integration: Works with SD-WAN vendors (Cisco, Aruba, etc.).
- Metrics: Tracks packet drops, bytes sent/received, and topology changes.
Visual Anchors
Reachability Analyzer Workflow
Logical Path Architecture
\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, align=center, minimum height=1cm}] \node (src) [fill=blue!10] {Source$EC2/ENI)}; \node (sg) [right of=src, xshift=1cm, fill=orange!10] {Security Group$Stateful)}; \node (nacl) [right of=sg, xshift=1cm, fill=red!10] {NACL$Stateless)}; \node (rt) [right of=nacl, xshift=1cm, fill=green!10] {Route Table$L3)}; \node (dest) [right of=rt, xshift=1cm, fill=blue!10] {Destination$RDS/S3/EC2)};
\draw[->, thick] (src) -- (sg);
\draw[->, thick] (sg) -- (nacl);
\draw[->, thick] (nacl) -- (rt);
\draw[->, thick] (rt) -- (dest);
\node[draw=none, below of=nacl, yshift=0.5cm] {\textit{Reachability Analyzer checks each hop in code}};\end{tikzpicture}
Definition-Example Pairs
- Configuration Drift: When a manual change (e.g., a developer adding a temporary security group rule) makes the network inconsistent with the original design.
- Example: An administrator deletes a route to a NAT Gateway to save costs, accidentally breaking outbound internet access for private subnets.
- Formal Reasoning: The math-based logic used by AWS to prove reachability without sending data.
- Example: Reachability Analyzer "knows" a packet is blocked because it sees a
Deny 0.0.0.0/0rule in the NACL, even if no server is currently trying to send data.
- Example: Reachability Analyzer "knows" a packet is blocked because it sees a
Worked Examples
Scenario: Web Server cannot reach Database
Symptoms: EC2 Web Server ($10.0.1.5) on port 3306.
- Step 1: Create Analysis Path
- Source: Web Server ENI
- Destination: DB Instance ID
- Protocol: TCP, Port: 3306
- Step 2: Run Analysis
- Output: Not Reachable.
- Hop Analysis: The path reached the Security Group of the RDS instance and was dropped.
- Step 3: Root Cause Identification
- The RDS Security Group only allowed traffic from $10.0.1.100/32$ (the old web server IP) instead of the whole Web Subnet or the Web SG.
- Step 4: Resolution
- Update RDS Security Group to allow Port 3306 from the Web Server SG ID. Re-run analysis to confirm Reachable status.
Checkpoint Questions
- Does Reachability Analyzer charge for tests that result in a "Not Reachable" status? (Yes, you are charged per analysis).
- What is the main difference between Reachability Analyzer and VPC Flow Logs? (Analyzer is proactive/predictive based on config; Flow Logs are reactive/historical based on actual traffic).
- True or False: Reachability Analyzer can identify latency issues between two on-premises routers. (False; it is primarily for AWS resource paths, though it can see up to the VPN/Direct Connect gateway).
Muddy Points & Cross-Refs
- Reachability Analyzer vs. Route Analyzer:
- Reachability Analyzer looks at the full stack (SG, NACL, Routes) for specific paths.
- Route Analyzer (inside Transit Gateway) focuses specifically on the TGW route tables and whether a route exists between attachments.
- Cross-Account Limitations: Reachability tests are generally scoped within an account unless using specific multi-account organizational tools.
Comparison Tables
| Feature | Reachability Analyzer | VPC Flow Logs | Traffic Mirroring |
|---|---|---|---|
| Primary Use | Configuration Troubleshooting | Auditing & Traffic Analysis | Deep Packet Inspection (DPI) |
| Data Source | AWS Configuration Metadata | IP Traffic Metadata | Actual Packet Copies |
| Speed | Near-instant (Seconds) | 1-15 Minute Delay | Real-time Stream |
| Cost Driver | Per Analysis Run | Per GB Ingested | Per Hour per ENI |
| Detects SGs? | Yes | Yes (REJECT) | Yes |