Mastering Hybrid Routing: BGP, Direct Connect, and VPN in AWS
Managing routing protocols for AWS and hybrid connectivity options (for example, over a Direct Connect connection, VPN)
Mastering Hybrid Routing: BGP, Direct Connect, and VPN in AWS
This study guide covers the implementation and management of routing protocols necessary for connecting on-premises environments to AWS, focusing on BGP over Direct Connect and VPN solutions.
Learning Objectives
After studying this guide, you should be able to:
- Configure and troubleshoot BGP (Border Gateway Protocol) over AWS Direct Connect and Site-to-Site VPN.
- Understand the hierarchy of route selection in AWS (Longest Prefix Match vs. Admin Distance).
- Design redundant hybrid connectivity models using active/passive or load-sharing patterns.
- Manage route propagation and handle CIDR overlaps in complex multi-account architectures.
Key Terms & Glossary
- Autonomous System (AS): A collection of IP networks under the control of one or more network operators on behalf of a single administrative entity.
- ASN (Autonomous System Number): A unique number assigned to an AS. AWS supports both 16-bit and 32-bit ASNs (AWS side is usually 64512–65534 for private).
- BGP Peering: A management connection between two BGP-speaking routers to exchange routing information.
- VIF (Virtual Interface): A logical connection on a Direct Connect link. Types include Private (to VPC), Public (to AWS public endpoints), and Transit (to Transit Gateway).
- Route Propagation: The process where a Virtual Private Gateway (VGW) or Transit Gateway (TGW) automatically pushes BGP-learned routes into a VPC route table.
The "Big Idea"
The core of AWS hybrid networking is the use of Industry-Standard BGP. By using BGP, AWS allows on-premises data centers to behave as if they are part of the same contiguous network as the VPC. This enables dynamic failover: if a high-speed Direct Connect link fails, BGP can automatically reroute traffic over a backup Site-to-Site VPN, ensuring business continuity without manual route table updates.
Formula / Concept Box
| Concept | Rule / Limit |
|---|---|
| Route Priority (Same Prefix) | Direct Connect > VPN (Static) > VPN (Dynamic/BGP) |
| Selection Rule | Longest Prefix Match always wins first. |
| BGP Route Limit | Typically 100 routes per BGP session on a VIF (Hard Limit). |
| AS-Path Prepending | Used to influence Inbound traffic to on-premises (longer path = less preferred). |
| MED (Multi-Exit Disc.) | Used to influence Inbound traffic from AWS to on-premises (lower = preferred). |
Hierarchical Outline
- Physical & Data Link Layers (Layer 1 & 2)
- Direct Connect (DX): Dedicated fiber, 802.1Q VLANs, Link Aggregation Groups (LAG).
- Jumbo Frames: Supported (9001 MTU) for DX; limited to 1500 MTU for VPN.
- Network Layer (Layer 3 - Routing)
- Static Routing: Manual entries; best for simple, small-scale VPNs.
- Dynamic Routing (BGP): Required for Direct Connect and highly recommended for VPN redundancy.
- Hybrid Connectivity Components
- Direct Connect Gateway (DXGW): Global resource to connect one DX link to multiple VPCs across regions.
- Transit Gateway (TGW): A hub-and-spoke router for connecting thousands of VPCs and on-premises networks.
- Traffic Engineering
- Outbound (AWS to On-Prem): Controlled via BGP attributes like Local Preference (on-prem side) or specific prefixes.
- Inbound (On-Prem to AWS): Controlled via AS-Path Prepending or MED.
Visual Anchors
Hybrid Connectivity Architecture
BGP Peering Detail
\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, align=center, minimum height=1cm}] \node (customer) {Customer Router \ (ASN 65001)}; \node (aws) [right=3cm of customer] {AWS DX Router \ (ASN 64512)};
\draw [<->, thick] (customer) -- node[above] {BGP Session (eBGP)} node[below] {VLAN ID / IP Peering} (aws);
\node (prefixes) [below=0.5cm of customer, draw=none] {Advertises: 10.0.0.0/16};
\node (aws_p) [below=0.5cm of aws, draw=none] {Advertises: 172.31.0.0/16};\end{tikzpicture}
Definition-Example Pairs
- Route Summarization: The process of advertising a single large CIDR block instead of multiple smaller ones.
- Example: Instead of advertising
10.0.1.0/24,10.0.2.0/24, and10.0.3.0/24, you advertise10.0.0.0/22to stay under the 100-route BGP limit.
- Example: Instead of advertising
- AS-Path Prepending: Artificially increasing the length of the AS-Path attribute by repeating your own ASN.
- Example: If you have two DX links, you prepend your ASN three times on the "Backup" link so AWS sees a path of
65001 65001 65001 65001vs the primary's65001. AWS will prefer the shorter path.
- Example: If you have two DX links, you prepend your ASN three times on the "Backup" link so AWS sees a path of
Worked Example: Redundant Design
Scenario: A company needs a 10Gbps Direct Connect link for primary traffic and a 1Gbps Site-to-Site VPN as a backup.
- Direct Connect Setup: Create a Private VIF. Establish BGP peering. Ensure Route Propagation is enabled on the VPC Route Table.
- VPN Setup: Create a Customer Gateway (CGW) and Virtual Private Gateway (VGW). Establish a BGP-based VPN.
- Conflict Resolution: Both links advertise the prefix
10.0.0.0/16. - AWS Behavior: AWS automatically prioritizes the Direct Connect path because it has a higher internal preference than VPN in the route selection hierarchy.
- Failure Event: If the DX fiber is cut, the BGP session drops. The DX route is removed from the VPC route table. The VPN-learned BGP route is now the best path, and traffic shifts automatically.
Checkpoint Questions
- What is the maximum number of routes you can advertise from on-premises to AWS over a single Direct Connect Private VIF?
- If a VPC route table has a static route for
10.0.0.0/24and a BGP-propagated route for10.0.0.0/16, which route will traffic take? - How does AWS determine the best path when receiving the same prefix from two different Direct Connect Gateways?
[!TIP] Answers: (1) 100 routes. (2)
10.0.0.0/24because of Longest Prefix Match. (3) It uses BGP attributes, typically favoring the one with the shortest AS-Path or lowest MED.
Muddy Points & Cross-Refs
- Static vs. Dynamic VPN: Static VPNs require you to manually add routes to the VPC route table. Dynamic (BGP) VPNs use route propagation. If you have both, the Static route wins because it has a lower administrative distance.
- CIDR Overlap: If on-premises and VPC use the same IP space, BGP will not resolve this. You must use PrivateLink or NAT Gateway to translate addresses.
- Quotas: Always monitor the
BGP Route Limitvia CloudWatch. Exceeding 100 routes will cause the BGP session to go into anIDLEstate.
Comparison Tables
| Feature | Direct Connect (DX) | Site-to-Site VPN |
|---|---|---|
| Throughput | 1Gbps, 10Gbps, 100Gbps | Up to 1.25Gbps per tunnel |
| Physical Layer | Dedicated Fiber (Private) | Public Internet (Encrypted) |
| Protocol | BGP Required | BGP or Static |
| Latency | Consistent/Low | Variable (Internet-based) |
| Cost | Port hour + Data Transfer | Hourly fee + Data Transfer |
| Deployment Time | Weeks/Months | Minutes |