AWS and Hybrid Network Logging and Monitoring: A Comprehensive Study Guide

Learning Objectives

After studying this guide, you should be able to:

Identify the core AWS services used for network logging and monitoring (CloudWatch, CloudTrail, VPC Flow Logs).
Differentiate between traffic metadata analysis (Flow Logs) and deep packet inspection (Traffic Mirroring).
Define a multi-account and hybrid audit strategy using AWS Config and Audit Manager.
Select the appropriate visibility tool for specific scenarios, such as path validation with Reachability Analyzer or global topology mapping with Transit Gateway Network Manager.
Implement log delivery architectures that leverage S3, Kinesis, and CloudWatch Logs for real-time and historical analysis.

Key Terms & Glossary

VPC Flow Logs: A feature that enables you to capture information about the IP traffic to and from network interfaces in your VPC.
Amazon CloudWatch: A monitoring and observability service that provides data and actionable insights for AWS, hybrid, and on-premises applications.
CloudWatch Insights: A tool within CloudWatch used to interactively search and analyze your log data.
AWS CloudTrail: A service that enables governance, compliance, operational auditing, and risk auditing by logging every API call made in your account.
Traffic Mirroring: An Amazon VPC feature that you can use to copy network traffic from an elastic network interface of type interface and send it to out-of-band security and monitoring appliances.
Reachability Analyzer: A configuration analysis tool that enables you to perform static connectivity testing between a source and a destination in your VPC.
AWS Config: A service that enables you to assess, audit, and evaluate the configurations of your AWS resources.

The "Big Idea"

In a modern AWS or hybrid environment, visibility is security. Without a centralized, automated logging and monitoring strategy, troubleshooting network latencies or detecting unauthorized traffic becomes impossible. The "Big Idea" is to move from reactive troubleshooting to proactive observability by correlating data across the entire stack—from API calls (CloudTrail) to packet headers (VPC Flow Logs) to actual payloads (Traffic Mirroring)—ensuring compliance and operational excellence.

Formula / Concept Box

Concept	Data Provided	Storage Targets
VPC Flow Logs	5-tuple (Src/Dest IP, Port, Protocol), Bytes, Packets	S3, CloudWatch Logs, Kinesis Firehose
CloudTrail	IAM User, Time, API Action, Source IP, Resource affected	S3, CloudWatch Logs
CloudWatch Metrics	Numeric performance data (CPU, Latency, Throughput)	CloudWatch Metrics Store
ALB/CLB Logs	HTTP Request details (URL, User Agent, Response Time)	S3

Hierarchical Outline

I. Foundational Monitoring Services
- A. Amazon CloudWatch
  - Metrics: Real-time performance monitoring for EC2, RDS, and S3.
  - Logs: Aggregating application and system-level log files.
  - Alarms: Triggering actions (SNS, Auto Scaling) based on thresholds.
- B. AWS CloudTrail
  - Management Events: Control plane actions (e.g., creating a VPC).
  - Data Events: Resource-level actions (e.g., S3 GetObject).
II. Network-Specific Visibility Tools
- A. VPC Flow Logs
  - Troubleshooting security group and NACL rules.
  - Monitoring for anomalous traffic patterns.
- B. VPC Reachability Analyzer
  - Static analysis of routing and security configurations without sending packets.
- C. Transit Gateway Network Manager
  - Centralized dashboard for global network topology and telemetry.
III. Audit and Compliance Strategy
- A. AWS Config: Tracking configuration changes over time and enforcing rules (e.g., "SGs must not allow SSH from 0.0.0.0/0").
- B. AWS Audit Manager: Automating the assessment of controls against frameworks (HIPAA, PCI DSS).

Visual Anchors

Log Aggregation Flowchart

Loading Diagram...

Hybrid Monitoring Architecture

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

CloudWatch Alarm: A mechanism that watches a single metric over a specified period.
- Example: Setting an alarm to notify the NetOps team via SNS if the ProcessedBytes on a Transit Gateway exceeds 1GB/s for more than 5 minutes.
VPC Traffic Mirroring: A feature that enables out-of-band packet capture.
- Example: Copying all UDP traffic from a specific web server to a Suricata IDS appliance to check for deep-packet exploits that Flow Logs (which only show metadata) would miss.
AWS Config Rule: A specific desired configuration for a resource.
- Example: A rule that flags any Security Group as "Non-compliant" if it has a rule allowing ingress traffic on port 22 (SSH) from the public internet.

Worked Examples

Example 1: Troubleshooting "Connection Refused" in a VPC

Scenario: A web server in a private subnet cannot reach a database in another private subnet.

Step 1: Use Reachability Analyzer: Specify the Source (Web ENI) and Destination (DB ENI/Port).
Step 2: Analysis: The tool shows a "REJECT" at the DB Security Group. It identifies that the SG lacks an inbound rule for the Web Server's CIDR.
Step 3: Verification: Check VPC Flow Logs for that specific ENI. You see entries with REJECT OK, confirming the Security Group is dropping the traffic.

Example 2: Analyzing Traffic Spikes for a Hybrid App

Scenario: A hybrid app is experiencing latency across a Direct Connect (DX) connection.

Step 1: Check CloudWatch Metrics: Review ConnectionState and VirtualInterfaceBpsEgress for the DX connection.
Step 2: CloudWatch Agent: On the on-premises servers, ensure the CloudWatch Agent is sending netstat and custom latency metrics to the AWS region.
Step 3: Correlation: Use CloudWatch Dashboards to overlay the DX bandwidth metrics with the on-premises server CPU metrics to determine if the bottleneck is the pipe or the server.

Checkpoint Questions

What is the main difference between CloudTrail and VPC Flow Logs?
Which storage target is most cost-effective for long-term retention of log data that only needs to be queried occasionally via SQL?
True or False: VPC Reachability Analyzer sends actual test packets through your network to verify connectivity.
Which service would you use to find out which IAM user deleted a specific VPC Peering connection?

Muddy Points & Cross-Refs

Flow Logs vs. Traffic Mirroring: Students often confuse these. Remember: Flow Logs = Metadata (who, when, how much). Traffic Mirroring = Data (the actual content/payload of the packets).
CloudWatch Logs vs. Metrics: Metrics are numbers (counters/gauges). Logs are text (sentences/JSON events). You can turn logs into metrics using Metric Filters.
Cross-Ref: For more on how to secure the logs themselves, see the chapter on S3 Bucket Policies and KMS Encryption.

Comparison Tables

Feature	VPC Flow Logs	CloudTrail	Traffic Mirroring
Layer of Operation	Layer 3/4 (Network/Transport)	Layer 7 (API/Management)	Layer 2-7 (Raw Packets)
Primary Goal	Traffic Troubleshooting	Security Auditing/Who did what	Deep Packet Inspection
Performance Impact	None (Out-of-band)	None (Out-of-band)	Negligible (Mirrored)
Cost Driver	Volume of log data	Free (1st trail), Data events	Hourly per ENI mirrored