Redundant Hybrid Connectivity: AWS Direct Connect and Site-to-Site VPN
Designing a redundant hybrid connectivity model with AWS services (for example, AWS Direct Connect, AWS Site-to-Site VPN)
Redundant Hybrid Connectivity: AWS Direct Connect and Site-to-Site VPN
This guide covers the architectural patterns and best practices for designing highly available and redundant connections between on-premises data centers and the AWS Cloud, a core pillar of the AWS Certified Advanced Networking Specialty (ANS-C01) exam.
Learning Objectives
After studying this guide, you will be able to:
- Differentiate between Direct Connect (DX) and Site-to-Site VPN use cases.
- Design redundant architectures using DX-plus-DX and DX-plus-VPN patterns.
- Understand BGP routing priority and failover mechanisms in hybrid environments.
- Identify single points of failure at the physical (Layer 1) and logical (Layer 3) levels.
Key Terms & Glossary
- Direct Connect (DX): A dedicated network connection from your premises to AWS that bypasses the public internet.
- Virtual Interface (VIF): A logical mapping on a DX connection; can be Private (to a VPC), Public (to AWS public endpoints), or Transit (to a Transit Gateway).
- Customer Gateway (CGW): The physical or software appliance on your side of the Site-to-Site VPN connection.
- Virtual Private Gateway (VGW): The VPN concentrator on the Amazon side of the VPN/DX connection.
- BGP (Border Gateway Protocol): The standard exterior gateway protocol used to exchange routing information between your network and AWS.
- LAG (Link Aggregation Group): A logical interface that uses the Link Aggregation Control Protocol (LACP) to aggregate multiple DX physical connections.
The "Big Idea"
Hybrid connectivity is not a "set and forget" service. To achieve true enterprise-grade resilience, you must eliminate single points of failure at three layers: the physical connection (cross-connects and circuits), the hardware (routers/CGWs), and the routing logic (BGP configurations). AWS treats Direct Connect as the primary path by default, but an architect must ensure that if the primary circuit or router fails, the system automatically shifts traffic to a secondary path without manual intervention.
Formula / Concept Box
| Concept | Rule / Specification |
|---|---|
| Route Priority | Direct Connect (BGP) > Site-to-Site VPN (BGP) > Static Routes |
| VPN Throughput | ~1.25 Gbps per tunnel (unless using ECMP with Transit Gateway) |
| DX Port Speeds | 1 Gbps, 10 Gbps, or 100 Gbps |
| High Availability | Use two DX locations for maximum resilience against AWS site failure |
Hierarchical Outline
- AWS Direct Connect (DX) Foundations
- Physical Layer: Requires a cross-connect at a DX location.
- Logical Layer: VIFs (Private, Public, Transit).
- Redundancy Architectures
- DX with DX Backup: Best for high-performance requirements; uses two separate DX locations.
- DX with VPN Backup: Cost-effective; VPN serves as a standby path.
- Hardware & Path Resiliency
- Redundant Routers: Deploying two CGWs on-premises to prevent hardware failure.
- Circuit Diversity: Ensuring local service providers use different physical paths for redundant links.
- Routing & Failover Logic
- BGP Attributes: Using AS-Path Prepending and Local Preference to influence traffic.
- Failover Behavior: DX routes are preferred; VPN takes over only when DX BGP session drops.
Visual Anchors
Route Priority Logic
Redundant Hybrid Architecture
\begin{tikzpicture}[node distance=2cm, every node/.style={fill=white, font=\small}, box/.style={rectangle, draw, minimum width=2.5cm, minimum height=1cm, align=center}]
% AWS Side \node[box, fill=orange!20] (vpc) {Amazon VPC}; \node[box, below of=vpc, fill=orange!10] (tgw) {Transit Gateway};
% Connections \node[box, left=3cm of tgw, fill=blue!10] (dx) {Direct Connect \ (Primary)}; \node[box, right=3cm of tgw, fill=green!10] (vpn) {Site-to-Site VPN \ (Backup)};
% Customer Side \node[box, below=3cm of tgw, fill=gray!20] (onprem) {On-Premises \ Data Center};
% Paths \draw[thick, ->] (vpc) -- (tgw); \draw[thick, <->] (tgw) -- (dx) node[midway, above] {BGP}; \draw[thick, <->] (tgw) -- (vpn) node[midway, above] {IPSec}; \draw[thick, <->] (dx) -- (onprem); \draw[thick, <->, dashed] (vpn) -- (onprem) node[midway, right] {Public Internet};
\end{tikzpicture}
Definition-Example Pairs
- AS-Path Prepending: The practice of artificially lengthening the BGP path to make a route less desirable.
- Example: If you have two DX links but want one to be standby, you prepend your AS number 3 times on the standby link so AWS sees a longer (worse) path.
- Public VIF: A logical interface to access AWS public services over DX without going through the internet.
- Example: Accessing S3 buckets or DynamoDB tables via a dedicated 10Gbps line instead of your company's general internet pipe.
Worked Examples
Scenario: Calculating Backup Throughput
Problem: A company has a 10 Gbps Direct Connect primary link. They want to use a Site-to-Site VPN as a backup. Their peak traffic is 4 Gbps. Will the VPN be a sufficient backup?
Step-by-Step Breakdown:
- Identify VPN Limits: A single AWS Site-to-Site VPN tunnel is limited to roughly 1.25 Gbps.
- Evaluate Peak Traffic: The peak is 4 Gbps.
- Compare: 1.25 Gbps < 4 Gbps.
- Conclusion: A single VPN tunnel is insufficient. The architect should either use multiple VPN tunnels with ECMP (Equal-Cost Multi-Path) via a Transit Gateway or provision a second DX link (e.g., 1 Gbps or 10 Gbps) to handle the 4 Gbps load.
Checkpoint Questions
- Which connectivity option is prioritized by AWS if both advertise the same prefix via BGP: Direct Connect or Site-to-Site VPN?
- To achieve the highest level of resiliency (Maximum Resiliency), how many DX locations should be used?
- What protocol is required to aggregate multiple physical DX connections into a single logical LAG?
- Does a Public VIF provide access to the public internet?
[!TIP] Answers: 1. Direct Connect. 2. Two separate locations. 3. LACP. 4. No, it only provides access to AWS public services (S3, etc.).
Muddy Points & Cross-Refs
- Local Preference vs. AS-Path: Students often confuse these. Local Preference influences outbound traffic (how you leave your network), while AS-Path Prepending influences inbound traffic (how AWS reaches you).
- Direct Connect Gateway (DXGW) vs. VGW: A VGW is for a single VPC. A DXGW allows a single DX VIF to connect to multiple VPCs (via VGWs) or a Transit Gateway across different regions.
Comparison Tables
| Feature | Direct Connect (DX) | Site-to-Site VPN |
|---|---|---|
| Medium | Private Fiber | Public Internet (Encrypted) |
| Consistency | High / Predictable | Variable (Best-effort) |
| Setup Time | Weeks/Months (Physical) | Minutes (Logical) |
| Encryption | Optional (via MACsec or VPN-over-DX) | Mandatory (IPSec) |
| Cost | Port hour + Data Transfer Out | Hourly + Data Transfer Out |
[!IMPORTANT] When using VPN as a backup to DX, ensure your on-premises router doesn't prioritize the VPN path. If the DX BGP session is up, traffic will naturally prefer it. If the DX session drops, BGP will withdraw the route, and the VPN route (if advertised) will become the active path.