Secure ML Infrastructure: VPCs, Subnets, and Security Groups
Building VPCs, subnets, and security groups to securely isolate ML systems
Secure ML Infrastructure: VPCs, Subnets, and Security Groups
Learning Objectives
After studying this guide, you should be able to:
- Define the role of a Virtual Private Cloud (VPC) in isolating Machine Learning (ML) workflows.
- Differentiate between Security Groups (stateful) and Network ACLs (stateless).
- Explain how VPC Endpoints (Interface and Gateway) enable private access to AWS services like S3 and SageMaker.
- Design a multi-tier network architecture to protect sensitive training data and model endpoints.
Key Terms & Glossary
- VPC (Virtual Private Cloud): A logically isolated section of the AWS Cloud where you launch resources in a virtual network you define.
- Subnet: A range of IP addresses in your VPC. ML resources are typically placed in private subnets to prevent direct internet access.
- Security Group (SG): A virtual firewall for your instance (e.g., SageMaker Notebook) that controls inbound and outbound traffic.
- Network Access Control List (NACL): An optional layer of security for your VPC that acts as a firewall for controlling traffic in and out of one or more subnets.
- AWS PrivateLink: Technology that provides private connectivity between VPCs, AWS services, and on-premises applications, securely on the Amazon network.
The "Big Idea"
Think of your ML infrastructure as a high-security research facility. The VPC is the outer perimeter fence. Subnets are different rooms or wings (some public for visitors, some private for researchers). NACLs are the security guards at the wing doors, checking everyone against a list. Security Groups are the electronic locks on individual office doors. To move data (the research) out safely without going through the public street (the Internet), you use a dedicated underground tunnel (VPC Endpoints).
Formula / Concept Box
| Feature | Security Group (SG) | Network ACL (NACL) |
|---|---|---|
| Scope | Instance/Resource level | Subnet level |
| State | Stateful: Return traffic is automatically allowed. | Stateless: Return traffic must be explicitly allowed. |
| Rules | Support "Allow" rules only. | Support "Allow" and "Deny" rules. |
| Evaluation | All rules evaluated before deciding traffic. | Rules processed in number order (lowest first). |
Hierarchical Outline
- Network Isolation (The Foundation)
- VPC CIDR Blocks: Defining the IP address range (e.g.,
10.0.0.0/16). - Subnet Segmentation: Dividing the VPC into Public (has route to IGW) and Private (no direct route) subnets.
- VPC CIDR Blocks: Defining the IP address range (e.g.,
- Layered Defense (Firewalls)
- NACLs: First line of defense at the subnet boundary; used for broad IP blocking.
- Security Groups: Granular control; can reference other SGs (e.g., "Allow SageMaker to talk to RDS").
- Private Connectivity (The Tunnels)
- Interface Endpoints: Powered by PrivateLink; assigns a private IP to the service (e.g., SageMaker API).
- Gateway Endpoints: Specific to S3 and DynamoDB; uses route table entries instead of private IPs.
- Shared Responsibility
- AWS: Security of the cloud (physical, global infrastructure).
- Customer: Security in the cloud (VPC config, IAM, data encryption).
Visual Anchors
Traffic Flow Architecture
Security Group vs. NACL Scope
\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, rounded corners, minimum width=3cm, minimum height=1cm, align=center}]
% Draw the layers
\draw[dashed, blue, thick] (-4,-3) rectangle (4,3);
\node[text=blue] at (0, 2.7) {VPC};
\draw[thick, black] (-3.5,-2.5) rectangle (3.5,2);
\node at (0, 1.7) {Subnet};
\node (nacl) [fill=orange!20] at (0, 0.8) {NACL (Stateless Filter)};
\draw[thick, red] (-2,-2) rectangle (2,0);
\node at (0, -0.3) {EC2 / SageMaker Instance};
\node (sg) [fill=green!20] at (0, -1.2) {Security Group (Stateful)};
% Connectors
\draw[<->] (nacl) -- (sg);\end{tikzpicture}
Definition-Example Pairs
- Stateful Firewall: A firewall that remembers the state of active connections.
- Example: If you send a request from a SageMaker Notebook (outbound) to an API, the response (inbound) is automatically allowed back through the Security Group without a specific inbound rule.
- Stateless Firewall: A firewall that treats every packet in isolation.
- Example: In a NACL, if you allow Port 80 inbound, you must also create an outbound rule for the ephemeral port range (usually 1024-65535) for the response to reach the user.
- VPC Interface Endpoint: An Elastic Network Interface (ENI) with a private IP address from your subnet's IP range.
- Example: Accessing the SageMaker Runtime API from within a private VPC without the traffic ever touching the public internet.
Worked Examples
Scenario: Securing a SageMaker Training Job
Problem: You need to run a SageMaker Training job that pulls data from S3. The company policy forbids any traffic from traversing the public internet.
Solution Steps:
- VPC Setup: Create a VPC with a private subnet (no Internet Gateway attached).
- S3 Access: Create a VPC Gateway Endpoint for S3. Add a route in the private subnet's route table pointing S3 traffic to the endpoint ID (
vpce-xxxx). - SageMaker Config: When launching the training job, specify the
SecurityGroupIdandSubnets(Private Subnet IDs) in theVpcConfigparameter. - Security Group Rules:
- Inbound: None (unless specific debugging is needed).
- Outbound: Allow HTTPS (Port 443) to the S3 Gateway Endpoint prefix list.
Checkpoint Questions
- Which AWS resource acts as a stateful firewall at the instance level?
- True or False: A single subnet can be associated with multiple NACLs simultaneously.
- Which two AWS services use Gateway Endpoints instead of Interface Endpoints?
- Why is a private subnet preferred for training ML models containing PII (Personally Identifiable Information)?
Muddy Points & Cross-Refs
- Stateful vs. Stateless: This is the most common point of confusion. Remember: Security Groups are Smart (Stateful) — they remember your connection. NACLs are Not (Stateless) — they forget immediately.
- Endpoint Types: If you see "S3" or "DynamoDB," think Gateway. For almost everything else (SageMaker, EC2 API, Kinesis), think Interface/PrivateLink.
- Cross-Ref: For more on managing the keys used to encrypt this data, see the AWS KMS (Key Management Service) study guide.
Comparison Tables
| Feature | Interface Endpoint | Gateway Endpoint |
|---|---|---|
| Services Supported | Most AWS Services (SageMaker, etc.) | Only S3 and DynamoDB |
| Cost | Hourly charge + data processing charge | Free |
| Implementation | Uses an ENI with a private IP | Uses a Route Table entry |
| Technology | Powered by AWS PrivateLink | VPC Routing Mechanism |