Lab: Designing and Troubleshooting Network Security Controls on AWS
Design and troubleshoot network security controls
Lab: Designing and Troubleshooting Network Security Controls on AWS
This hands-on lab focuses on implementing and debugging network security layers within an Amazon VPC. You will build a multi-tier architecture, intentionally misconfigure security controls, and use the AWS Reachability Analyzer to diagnose and resolve connectivity issues.
[!WARNING] This lab involves creating billable AWS resources (EC2, VPC). Remember to run the teardown commands at the end to avoid ongoing charges.
Prerequisites
- An AWS Account with administrative access or permissions for VPC, EC2, and Network Insights.
- AWS CLI installed and configured with credentials for
<YOUR_REGION>. - Basic knowledge of IPv4 CIDR notation (e.g.,
/16,/24). - Familiarity with SSH/Terminal usage.
Learning Objectives
- Design a secure multi-tier network using Public and Private subnets.
- Implement stateful (Security Groups) and stateless (Network ACLs) security controls.
- Troubleshoot complex connectivity blocks using static configuration analysis tools.
- Differentiate between SG-level blocks and NACL-level blocks.
Architecture Overview
We will build a simple Web-to-Database architecture. The Web server resides in a public subnet, and the Database server resides in a private subnet.
The Layered Defense Model
\begin{tikzpicture}[node distance=1.5cm, every node/.style={fill=white, font=\footnotesize}, scale=0.8] \draw[thick, fill=blue!10] (0,0) rectangle (10,6); \node at (5,5.5) {\textbf{Network Security Layers}};
% Subnet Layer
\draw[dashed, fill=green!5] (1,1) rectangle (9,5);
\node at (5,4.5) {Subnet Layer (Stateless NACL)};
% Instance Layer
\draw[dotted, fill=orange!5] (2,1.5) rectangle (8,3.5);
\node at (5,3.2) {Instance Layer (Stateful Security Group)};
% Resource
\draw[fill=red!20] (4,2) rectangle (6,2.8);
\node at (5,2.4) {Target Resource};
% Arrows
\draw[->, ultra thick, red] (5,7) -- (5,2.9) node[midway, left, black] {Incoming Traffic};\end{tikzpicture}
Step-by-Step Instructions
Step 1: Initialize the VPC Infrastructure
Create a VPC and two subnets (Public and Private).
# Create VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=brainybee-lab-vpc}]'
# Create Public Subnet
aws ec2 create-subnet --vpc-id <VPC_ID> --cidr-block 10.0.1.0/24 --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=public-sn}]'
# Create Private Subnet
aws ec2 create-subnet --vpc-id <VPC_ID> --cidr-block 10.0.2.0/24 --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=private-sn}]'▶Console alternative
Navigate to VPC > Your VPCs > Create VPC. Select "VPC and more". Set Name tag to brainybee-lab-vpc, CIDR to 10.0.0.0/16, and configure 2 subnets (1 public, 1 private).
Step 2: Implement "Broken" Security Controls
We will now create a Security Group for the DB instance but intentionally misconfigure it to block the Web server's traffic.
# Create DB Security Group
aws ec2 create-security-group --group-name db-sg --description "Database security group" --vpc-id <VPC_ID>
# INCORRECT RULE: Allowing MySQL (3306) from a wrong IP (e.g., 1.1.1.1/32)
# instead of the Web Server's IP or SG.
aws ec2 authorize-security-group-ingress --group-id <DB_SG_ID> --protocol tcp --port 3306 --cidr 1.1.1.1/32[!NOTE] Real-world example: A developer updates a security group to allow their own home IP for testing but forgets to include the Application Load Balancer's subnet range, breaking the production site.
Step 3: Run Reachability Analyzer to Diagnose
Instead of manually checking every rule, we will use the automated analyzer.
# Create a path between Web Instance and DB Instance
aws ec2 create-network-insights-path \
--source <WEB_INSTANCE_ID> \
--destination <DB_INSTANCE_ID> \
--destination-port 3306 \
--protocol tcp
# Start the analysis
aws ec2 start-network-insights-analysis --network-insights-path-id <PATH_ID>▶Console alternative
Navigate to VPC > Reachability Analyzer. Click Create and analyze path. Set Source as WebServer and Destination as DBServer. Set destination port to 3306.
Step 4: Fix the Security Control
Identify the failing component (Security Group) in the analysis results and correct it.
# Remove the bad rule
aws ec2 revoke-security-group-ingress --group-id <DB_SG_ID> --protocol tcp --port 3306 --cidr 1.1.1.1/32
# Add the correct rule (Allowing traffic from the Web SG ID)
aws ec2 authorize-security-group-ingress --group-id <DB_SG_ID> --protocol tcp --port 3306 --source-group <WEB_SG_ID>Checkpoints
- Analysis Status: Check the Reachability Analyzer output.
- Expected Result: Status changed from
unreachabletoreachable.
- Expected Result: Status changed from
- SG Rule Verification: Run
aws ec2 describe-security-groups --group-ids <DB_SG_ID>.- Expected Result: Ingress rule shows the Web Security Group ID as the source.
- Network Path: In the Console, the hop-by-hop diagram should now show green checkmarks across all components.
Troubleshooting
| Issue | Possible Cause | Fix |
|---|---|---|
Path remains unreachable | NACL is blocking traffic | Check the Subnet NACL. Remember NACLs are stateless and need return traffic rules. |
| "UnauthorizedOperation" | Missing IAM permissions | Ensure your IAM user has ec2:CreateNetworkInsightsPath and ec2:StartNetworkInsightsAnalysis. |
| Analysis takes too long | Pending analysis | Analysis usually takes 1-2 minutes. Ensure the VPC is in an 'available' state. |
Clean-Up / Teardown
[!IMPORTANT] Failure to delete these resources will result in continued charges for the EC2 instances.
# Delete Network Insights Path
aws ec2 delete-network-insights-path --network-insights-path-id <PATH_ID>
# Terminate Instances
aws ec2 terminate-instances --instance-ids <WEB_ID> <DB_ID>
# Delete Security Groups (after instances are terminated)
aws ec2 delete-security-group --group-id <DB_SG_ID>
aws ec2 delete-security-group --group-id <WEB_SG_ID>
# Delete VPC (this will delete subnets and IGW if created via wizard)
aws ec2 delete-vpc --vpc-id <VPC_ID>Challenge
The Stateless Trap: Modify the Network ACL of the Private Subnet to allow inbound traffic on port 3306 but remove the Outbound Rule for ephemeral ports (1024-65535).
- Run Reachability Analyzer again.
- Observe how the tool identifies that the connection is blocked even though the Security Group (stateful) and Inbound NACL (stateless) are correct.
- Fix the NACL by adding an outbound rule.