Lab: Designing and Testing a Reliable Multi-AZ Web Architecture

This lab guides you through designing a resilient strategy to meet reliability requirements, specifically focusing on the AWS Well-Architected Framework principles of horizontal scaling, automatic recovery from failure, and testing recovery procedures.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for the Application Load Balancer and EC2 instances.

Prerequisites

An active AWS Account.
AWS CLI configured with administrator-level permissions.
A default VPC in your region with at least two public subnets.
Basic knowledge of EC2, VPC, and Load Balancing.

Learning Objectives

Deploy a Multi-AZ Application Load Balancer (ALB) to eliminate single points of failure.
Configure an Auto Scaling Group (ASG) with health checks to enable self-healing.
Test recovery procedures by simulating an instance failure and observing automatic replacement.
Implement infrastructure as code (via CLI) to ensure repeatable, automated change management.

Architecture Overview

We will build a highly available web tier that spans two Availability Zones (AZs). The Load Balancer will distribute traffic, while the Auto Scaling Group ensures the desired number of healthy instances is maintained.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create a Security Group

We need a security group that allows HTTP traffic (Port 80) from the internet to our load balancer and instances.

bash

# Create the Security Group
aws ec2 create-security-group \
    --group-name brainybee-lab-sg \
    --description "Allow HTTP traffic" \
    --vpc-id <YOUR_VPC_ID>

# Authorize Inbound HTTP
aws ec2 authorize-security-group-ingress \
    --group-name brainybee-lab-sg \
    --protocol tcp \
    --port 80 \
    --cidr 0.0.0.0/0

▶Console alternative

Navigate to

EC2 > Security Groups > Create security group

. Add an Inbound rule for HTTP (80) with source 0.0.0.0/0.

Step 2: Create an Application Load Balancer

The ALB provides the entry point for our application and performs health checks to ensure reliability.

bash

# Create the ALB
aws elbv2 create-load-balancer \
    --name brainybee-lab-alb \
    --subnets <SUBNET_ID_AZ1> <SUBNET_ID_AZ2> \
    --security-groups <SG_ID_FROM_STEP_1>

Step 3: Create a Launch Template

A launch template ensures we "stop guessing capacity" and define the exact environment (AMI, instance type) for our workloads.

bash

# Create the Launch Template
aws ec2 create-launch-template \
    --launch-template-name ReliabilityLabTemplate \
    --version-description version1 \
    --launch-template-data '{"NetworkInterfaces":[{"AssociatePublicIpAddress":true,"DeviceIndex":0,"Groups":["<SG_ID_FROM_STEP_1>"]}],"ImageId":"ami-0c55b159cbfafe1f0","InstanceType":"t2.micro"}'

Step 4: Create the Auto Scaling Group (ASG)

This is the core of our "Automatic Recovery" strategy. The ASG will maintain a minimum of 2 instances across 2 AZs.

bash

aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name brainybee-lab-asg \
    --launch-template LaunchTemplateName=ReliabilityLabTemplate \
    --min-size 2 \
    --max-size 4 \
    --desired-capacity 2 \
    --vpc-zone-identifier "<SUBNET_ID_AZ1>,<SUBNET_ID_AZ2>"

Checkpoints

Deployment Verification: Run aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name brainybee-lab-asg. Ensure DesiredCapacity is 2 and Instances list has 2 entries in InService state.
DNS Reachability: Copy the DNS Name of your ALB from the describe-load-balancers output. Paste it into your browser. You should see the default web page (ensure your AMI has a web server running, or use a UserData script to install one).

Testing Recovery (Failure Simulation)

To meet the reliability requirement of "testing recovery procedures," we will manually terminate an instance.

Identify Instance: aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=brainybee-lab-asg"
Terminate Instance: Pick one InstanceId and run:
bash
aws ec2 terminate-instances --instance-ids <INSTANCE_ID>
Observe: Within 1-2 minutes, the ASG will detect the failure (due to EC2 status checks) and automatically launch a replacement instance to maintain the desired capacity of 2.

Clean-Up / Teardown

To avoid costs, you must delete the resources in this specific order:

bash

# 1. Delete the ASG (This terminates the instances)
aws autoscaling delete-auto-scaling-group --auto-scaling-group-name brainybee-lab-asg --force-delete

# 2. Delete the ALB
aws elbv2 delete-load-balancer --load-balancer-arn <ALB_ARN>

# 3. Delete the Launch Template
aws ec2 delete-launch-template --launch-template-name ReliabilityLabTemplate

# 4. Delete the Security Group
aws ec2 delete-security-group --group-id <SG_ID>

Troubleshooting

Error	Cause	Fix
`Instance in wrong AZ`	Subnets provided to ASG were in a single AZ.	Recreate ASG with subnets from two different AZs.
`ALB 503 Service Unavailable`	Target Group not yet healthy or registered.	Wait 2-3 minutes for health checks to pass; verify Security Group allows ALB -> Instance traffic.
`CLI: Permission Denied`	IAM Role lacks EC2/AutoScaling permissions.	Attach `AmazonEC2FullAccess` and `AutoScalingFullAccess` to your IAM user.

Stretch Challenge

Implement Predictive Scaling: Instead of static capacity, use the AWS CLI to attach a Target Tracking Scaling Policy to your ASG that maintains a target average CPU utilization of 50%. This addresses the principle of "scaling to satisfy demand proactively."

Cost Estimate

Service	Usage	Estimated Cost (Monthly/Pro-rated)
EC2 t2.micro	2 Instances for 1 hour	$0.00 (Free Tier) or ~$0.02
Application Load Balancer	1 ALB for 1 hour	~$0.025
Data Transfer	Minimal	$0.00
Total		<$0.10

Concept Review

This lab implemented several reliability pillars from the AWS SAP-C02 guide:

Horizontal Scaling: We used an ASG and ALB to distribute load across multiple instances rather than relying on one large instance.
Self-Healing: By setting min-size and desired-capacity, the ASG acts as a control loop that automatically recovers from instance-level failures.
Foundation Requirements: We leveraged the AWS Global Infrastructure (Multi-AZ) to protect against data center-level outages.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Lab: Designing and Testing a Reliable Multi-AZ Web Architecture

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for the Application Load Balancer and EC2 instances.

Prerequisites

An active AWS Account.
AWS CLI configured with administrator-level permissions.
A default VPC in your region with at least two public subnets.
Basic knowledge of EC2, VPC, and Load Balancing.

Learning Objectives

Deploy a Multi-AZ Application Load Balancer (ALB) to eliminate single points of failure.
Configure an Auto Scaling Group (ASG) with health checks to enable self-healing.
Test recovery procedures by simulating an instance failure and observing automatic replacement.
Implement infrastructure as code (via CLI) to ensure repeatable, automated change management.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create a Security Group

We need a security group that allows HTTP traffic (Port 80) from the internet to our load balancer and instances.

bash

# Create the Security Group
aws ec2 create-security-group \
    --group-name brainybee-lab-sg \
    --description "Allow HTTP traffic" \
    --vpc-id <YOUR_VPC_ID>

# Authorize Inbound HTTP
aws ec2 authorize-security-group-ingress \
    --group-name brainybee-lab-sg \
    --protocol tcp \
    --port 80 \
    --cidr 0.0.0.0/0

▶Console alternative

Navigate to

EC2 > Security Groups > Create security group

. Add an Inbound rule for HTTP (80) with source 0.0.0.0/0.

Step 2: Create an Application Load Balancer

The ALB provides the entry point for our application and performs health checks to ensure reliability.

bash

# Create the ALB
aws elbv2 create-load-balancer \
    --name brainybee-lab-alb \
    --subnets <SUBNET_ID_AZ1> <SUBNET_ID_AZ2> \
    --security-groups <SG_ID_FROM_STEP_1>

Step 3: Create a Launch Template

A launch template ensures we "stop guessing capacity" and define the exact environment (AMI, instance type) for our workloads.

bash

# Create the Launch Template
aws ec2 create-launch-template \
    --launch-template-name ReliabilityLabTemplate \
    --version-description version1 \
    --launch-template-data '{"NetworkInterfaces":[{"AssociatePublicIpAddress":true,"DeviceIndex":0,"Groups":["<SG_ID_FROM_STEP_1>"]}],"ImageId":"ami-0c55b159cbfafe1f0","InstanceType":"t2.micro"}'

Step 4: Create the Auto Scaling Group (ASG)

This is the core of our "Automatic Recovery" strategy. The ASG will maintain a minimum of 2 instances across 2 AZs.

bash

aws autoscaling create-auto-scaling-group \
    --auto-scaling-group-name brainybee-lab-asg \
    --launch-template LaunchTemplateName=ReliabilityLabTemplate \
    --min-size 2 \
    --max-size 4 \
    --desired-capacity 2 \
    --vpc-zone-identifier "<SUBNET_ID_AZ1>,<SUBNET_ID_AZ2>"

Checkpoints

Deployment Verification: Run aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name brainybee-lab-asg. Ensure DesiredCapacity is 2 and Instances list has 2 entries in InService state.
DNS Reachability: Copy the DNS Name of your ALB from the describe-load-balancers output. Paste it into your browser. You should see the default web page (ensure your AMI has a web server running, or use a UserData script to install one).

Testing Recovery (Failure Simulation)

To meet the reliability requirement of "testing recovery procedures," we will manually terminate an instance.

Identify Instance: aws ec2 describe-instances --filters "Name=tag:aws:autoscaling:groupName,Values=brainybee-lab-asg"
Terminate Instance: Pick one InstanceId and run:
bash
aws ec2 terminate-instances --instance-ids <INSTANCE_ID>
Observe: Within 1-2 minutes, the ASG will detect the failure (due to EC2 status checks) and automatically launch a replacement instance to maintain the desired capacity of 2.

Clean-Up / Teardown

To avoid costs, you must delete the resources in this specific order:

bash

# 1. Delete the ASG (This terminates the instances)
aws autoscaling delete-auto-scaling-group --auto-scaling-group-name brainybee-lab-asg --force-delete

# 2. Delete the ALB
aws elbv2 delete-load-balancer --load-balancer-arn <ALB_ARN>

# 3. Delete the Launch Template
aws ec2 delete-launch-template --launch-template-name ReliabilityLabTemplate

# 4. Delete the Security Group
aws ec2 delete-security-group --group-id <SG_ID>

Troubleshooting

Error	Cause	Fix
`Instance in wrong AZ`	Subnets provided to ASG were in a single AZ.	Recreate ASG with subnets from two different AZs.
`ALB 503 Service Unavailable`	Target Group not yet healthy or registered.	Wait 2-3 minutes for health checks to pass; verify Security Group allows ALB -> Instance traffic.
`CLI: Permission Denied`	IAM Role lacks EC2/AutoScaling permissions.	Attach `AmazonEC2FullAccess` and `AutoScalingFullAccess` to your IAM user.

Stretch Challenge

Cost Estimate

Service	Usage	Estimated Cost (Monthly/Pro-rated)
EC2 t2.micro	2 Instances for 1 hour	$0.00 (Free Tier) or ~$0.02
Application Load Balancer	1 ALB for 1 hour	~$0.025
Data Transfer	Minimal	$0.00
Total		<$0.10

Concept Review

This lab implemented several reliability pillars from the AWS SAP-C02 guide:

Horizontal Scaling: We used an ASG and ALB to distribute load across multiple instances rather than relying on one large instance.
Self-Healing: By setting min-size and desired-capacity, the ASG acts as a control loop that automatically recovers from instance-level failures.
Foundation Requirements: We leveraged the AWS Global Infrastructure (Multi-AZ) to protect against data center-level outages.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds