Study Guide945 words

AWS Network Performance Analysis & Troubleshooting Study Guide

Analyzing tool output to assess network performance and troubleshoot connectivity (for example, VPC Flow Logs, Amazon CloudWatch Logs)

AWS Network Performance Analysis & Troubleshooting

This guide covers the essential tools and techniques required to analyze network performance and troubleshoot connectivity within AWS, specifically focusing on the ANS-C01 curriculum.

Learning Objectives

By the end of this module, you should be able to:

  • Configure and Interpret VPC Flow Logs to identify traffic patterns and security rejections.
  • Utilize Amazon CloudWatch to create alarms and dashboards for network health monitoring.
  • Perform Deep Packet Inspection (DPI) using VPC Traffic Mirroring for complex troubleshooting.
  • Validate Connectivity Pathing using AWS Reachability Analyzer and Transit Gateway Network Manager.
  • Identify Root Causes of connectivity failures such as Security Group/NACL misconfigurations or MTU mismatches.

Key Terms & Glossary

  • VPC Flow Logs: A feature that enables you to capture information about the IP traffic going to and from network interfaces in your VPC.
  • CloudWatch Logs: A managed service to monitor, store, and access log files from AWS resources.
  • Traffic Mirroring: An Amazon VPC feature that you can use to copy network traffic from an elastic network interface (ENI).
  • Reachability Analyzer: A configuration analysis tool that enables you to perform connectivity testing between a source and destination in your VPC.
  • 5-Tuple: The five pieces of information that uniquely identify a network connection (Source IP, Destination IP, Source Port, Destination Port, Protocol).

The "Big Idea"

In cloud networking, visibility is often obscured by the shared responsibility model. You cannot plug a physical sniffer into an AWS rack. Therefore, troubleshooting depends on telemetry aggregation. By correlating VPC Flow Logs (Layer 4 metadata) with Reachability Analyzer (Control Plane logic) and Traffic Mirroring (Data Plane reality), you create a comprehensive observability stack to solve complex hybrid networking issues.

Formula / Concept Box

Log/Metric ComponentPurposeKey Identifier
Flow Log FormatBase log structure${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol}
Action StatusResult of security checkACCEPT (Permitted) or REJECT (Denied)
Log StatusQuality of log dataOK (Normal) or NODATA / SKIPDATA (Missing)
Reachability StatusLogical path checkReachable or Unreachable (with failure point)

Hierarchical Outline

  1. Network Observability Tools
    • VPC Flow Logs: Capture IP traffic; can be sent to S3 or CloudWatch.
    • CloudWatch Metrics: Monitor throughput, latency, and packet loss.
    • CloudWatch Insights: Query language for searching millions of log lines.
  2. Path Analysis & Routing
    • Reachability Analyzer: Tests logical paths without sending packets (Dry run).
    • TGW Network Manager: Visualizes global topology across regions.
  3. Advanced Troubleshooting
    • VPC Traffic Mirroring: Captures raw L2-L7 packets for Wireshark analysis.
    • MTU Verification: Troubleshooting "jumbo frame" issues (9001 bytes) vs standard internet MTU (1500 bytes).

Visual Anchors

Log Aggregation Workflow

Loading Diagram...

Logical Reachability Check

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • REJECT in Flow Logs: Indicates traffic was blocked by a Security Group or NACL.
    • Example: A web server log shows REJECT on port 22; this implies the Security Group lacks an ingress rule for SSH.
  • MTU Mismatch: Occurs when a packet is larger than the network interface can handle without fragmentation.
    • Example: A Direct Connect link drops packets larger than 1500 bytes because Jumbo Frames (9001) were enabled on the EC2 but not supported by the router.
  • Packet Shaping/Throttling: The intentional slowing of traffic to meet limits.
    • Example: Monitoring the NetworkOut metric in CloudWatch to see if an instance is hitting its baseline bandwidth limit.

Worked Examples

Example 1: The "Unreachable" Web Server

Scenario: An EC2 instance in a private subnet cannot reach a database in another VPC via VPC Peering.

  1. Step 1: Check VPC Flow Logs. See REJECT on the source side. Result: Security Group needs update.
  2. Step 2: If Flow Logs show ACCEPT but traffic fails, run Reachability Analyzer.
  3. Step 3: Reachability Analyzer reports "Unreachable" due to a missing route in the Route Table for the Peering Connection.
  4. Solution: Add the destination CIDR to the source subnet route table pointing to the pcx-xxxx ID.

Scenario: A hybrid app via Direct Connect is experiencing high latency.

  1. Analyze: Use CloudWatch metrics for the Virtual Private Gateway (VGW).
  2. Discovery: DirectConnect_BpsOut is peaking at the provisioned limit.
  3. Solution: Implement CloudFront for static assets or upgrade the DX connection bandwidth.

Checkpoint Questions

  1. What is the main difference between a REJECT recorded by a Security Group versus a NACL in Flow Logs? (Hint: Security groups are stateful; NACLs are stateless).
  2. Which tool would you use to verify if a packet is being malformed during transit?
  3. If CloudWatch Metrics show 0% packet loss but the application reports timeouts, which AWS tool should you use next?

Muddy Points & Cross-Refs

  • Flow Logs vs. Traffic Mirroring: Remember, Flow Logs are metadata (like a phone bill: who called whom and for how long). Traffic Mirroring is the actual recording of the conversation. Use Flow Logs first; use Mirroring only for deep protocol errors.
  • Security Groups vs. NACLs: If you see a REJECT in the Flow Log, it doesn't specify which one blocked it. You must check the Security Group first, then the NACL.

Comparison Tables

FeatureVPC Flow LogsReachability AnalyzerVPC Traffic Mirroring
LayerLayer 4 (Metadata)Control Plane (Logic)Layer 2-7 (Packets)
CostLow (per GB ingested)Per analysis ($0.10)High (per hour + throughput)
Use CaseSecurity auditing / TrendingDebugging pathing/routingForensic analysis / IDS
Real-time?Delayed (1-10 mins)Instant (On-demand)Real-time stream

Ready to study AWS Certified Advanced Networking - Specialty (ANS-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free