Curriculum Overview: AWS Global Infrastructure & AZ Fault Independence

This curriculum focuses on the architectural design of the AWS Global Infrastructure, specifically exploring how Availability Zones (AZs) are engineered to eliminate single points of failure. Students will learn the physical and logical separation techniques that ensure high availability and reliability for cloud-based applications.

Prerequisites

Before beginning this module, students should have a baseline understanding of the following:

Basic Cloud Concepts: Knowledge of the Cloud Computing Shared Responsibility Model.
AWS Regions: Understanding that a Region is a physical location in the world where AWS clusters data centers.
Redundancy: The general IT principle of duplicating critical components to increase system reliability.
Networking Basics: A high-level understanding of IP addresses and subnets.

Module Breakdown

Module	Title	Topic Focus	Difficulty
1	AZ Physical Architecture	Data center clusters, discrete facilities, and floodplains.	Beginner
2	Utility Independence	Power grids, UPS systems, and onsite backup generation.	Intermediate
3	Network Resilience	Tier 1 transit providers and low-latency interconnects.	Intermediate
4	Designing for High Availability	Multi-AZ deployments, ELB, and Auto Scaling integration.	Advanced

Learning Objectives per Module

Module 1: AZ Physical Architecture

Define an Availability Zone as one or more discrete data centers with redundant power and networking.
Explain the physical separation of AZs within a metropolitan region to mitigate local disasters (e.g., fires, floods).

Module 2: Utility Independence

Identify how AZs use independent power grids from different utilities.
Describe the role of Uninterruptible Power Supplies (UPS) and onsite backup generation in preventing failure propagation.

Module 3: Network Resilience

Recognize that AZs are redundantly connected to multiple Tier 1 transit providers.
Diagram the relationship between Regions and their constituent AZs.

Module 4: Designing for High Availability

Evaluate the use of horizontal scaling across multiple AZs to remove single points of failure.
Demonstrate how Elastic Load Balancing (ELB) distributes traffic across multiple failure zones.

Visual Anchors

AWS Global Infrastructure Hierarchy

Loading Diagram...

AZ Fault Independence Components

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Examples Section

[!TIP] Real-World Scenario: The Utility Grid Failure Imagine a regional power grid failure in Northern Virginia. Because AZs like us-east-1a and us-east-1b are fed by independent utility substations, a failure in the grid supplying AZ-A will not affect the grid supplying AZ-B. This ensures that a multi-AZ application remains operational.

Comparative Examples of Availability

Deployment Strategy	Resilience Level	Description
Single Instance in 1 AZ	Low	If the AZ fails, the application goes offline (Single Point of Failure).
Multiple Instances in 1 AZ	Medium	Protects against individual hardware failure, but not data center/AZ failure.
Multi-AZ Deployment	High	Application traffic is shifted to healthy AZs if one becomes unavailable.

Success Metrics

To demonstrate mastery of this curriculum, students must be able to:

Articulate why AWS lists AZs in a randomized order in the console (to prevent resource skewing in the first listed AZ).
Calculate potential uptime for a Multi-AZ architecture: $Uptime \% > 99.99\%$
Identify the specific components that make an AZ an "independent failure zone" (Power, Networking, Cooling).
Distinguish between horizontal scaling (adding instances across AZs) and vertical scaling (adding resources to a single instance).

Real-World Application

Disaster Recovery (DR): Architects use AZs to ensure that even in the event of a facility-level disaster, data is not lost and services remain active.
Business Continuity: For mission-critical banking or healthcare applications, Multi-AZ deployment is the standard for maintaining 24/7 operations.
Compliance: Many regulatory frameworks require data redundancy across geographically separate locations, which AZs provide natively within a single Region.

[!IMPORTANT] An AZ is not just a single data center. It can be a cluster of multiple data centers. The key is that they share the same failure boundary but remain isolated from other AZs.