Mastering Application Rollbacks and Deployment Strategies (AWS DVA-C02)

This guide covers the mechanics of rolling back application updates within AWS, focusing on how different deployment strategies influence the speed, cost, and availability of the rollback process.

Learning Objectives

After studying this guide, you should be able to:

Identify the specific rollback mechanism for Rolling, Immutable, and Traffic Splitting strategies.
Compare the cost and availability trade-offs of various rollback scenarios.
Determine the appropriate deployment strategy based on the risk profile of an application update.
Explain how AWS Elastic Beanstalk handles environment transitions during a failure.

Key Terms & Glossary

Rollback: The process of returning an application to its previous stable version after a failed or buggy deployment.
Immutable Deployment: A strategy where a completely new set of resources is provisioned for the new version, leaving the old version untouched until the new one is verified.
Canary Testing: A practice of directing a small percentage of traffic to a new version to test its stability before a full rollout.
In-place Update: A deployment where the application code is updated directly on existing instances (common in basic Rolling deployments).
Fleet: The collection of compute instances (e.g., EC2) currently serving the application.

The "Big Idea"

[!IMPORTANT] A rollback is not merely an "undo" button; it is a strategic architectural shift. The complexity and speed of a rollback are inversely proportional to how much the original environment was altered during the deployment. Strategies that preserve the original environment (like Immutable or Blue/Green) offer the fastest and safest rollbacks, while strategies that modify existing resources (like Rolling) require more time to revert.

Formula / Concept Box

Strategy	Rollback Mechanism	Rollback Speed	Additional Cost	Impact on Availability
All at Once	Redeploy previous version	Slow	Zero	High (Downtime)
Rolling	Reverse the rolling update	Moderate	Zero	Medium (Reduced capacity)
Rolling + Batch	Reverse the rolling update	Moderate	Low	Low (Full capacity maintained)
Immutable	Terminate new ASG; keep old	Fastest	High	None
Traffic Splitting	Shift traffic 100% back to old	Fast	Moderate	None

Hierarchical Outline

In-Place Deployment Rollbacks
- Rolling Updates: Updates nodes in subsets. Rollback requires a new deployment of the old version to be pushed back through the fleet.
- Rolling with Additional Batch: Maintains full capacity. Rollback is similar to standard rolling but avoids performance degradation during the revert.
Resource-Based Rollbacks
- Immutable Strategy: Uses a temporary Auto Scaling Group (ASG). Rollback is achieved by simply deleting the new ASG and its instances.
- Infrastructure as Code (IaC): Using CloudFormation or SAM to restore previous stack states.
Traffic-Based Rollbacks
- Canary / Traffic Splitting: Percentage-based traffic shifting. Rollback is a simple configuration change in the Load Balancer or Route 53 to redirect 100% of traffic to the original version.

Visual Anchors

Rollback Decision Logic

Loading Diagram...

Immutable Architecture Comparison

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Inconsistency Period: The window during a rolling deployment where some users see the old version and others see the new.
- Example: A banking app updates its UI. During the 15-minute rolling window, User A sees a blue login button while User B sees a green one. If a bug is found, the rollback will also have an inconsistency period.
Additional Batch: A buffer of instances added before starting an update to prevent capacity drops.
- Example: An ASG has 10 instances. To update, 2 new instances are launched (Total: 12). Then, 2 old instances are updated. This ensures there are always at least 10 healthy instances serving traffic.
Health Check Grace Period: The time a system waits before deciding a new instance is "failed."
- Example: In an Immutable deployment, Elastic Beanstalk waits for the new instances to pass Load Balancer health checks before terminating the old ones. If they fail, the rollback triggers automatically.

Worked Examples

Scenario: Rolling Back an Immutable Deployment in Elastic Beanstalk

The Problem: You are deploying version 2.0 of a web application using the Immutable strategy. Halfway through the deployment, the new instances fail their health checks because of a database connection error in the new code.

The Step-by-Step Rollback:

Detection: Elastic Beanstalk monitors the new Auto Scaling Group. It notices that the "Ready" state is never reached.
Isolation: Because this is an Immutable deployment, the original instances (Version 1.0) are still running in their original ASG, untouched.
Cleanup: Elastic Beanstalk automatically terminates the new ASG and the failed instances.
Restoration: Since the old instances never stopped serving traffic, the rollback is essentially "instant" from the user's perspective. No code needs to be re-uploaded.

[!NOTE] While this is the safest method, the "worked example" above would have incurred a cost spike because for a brief moment, you had double the EC2 instances running.

Checkpoint Questions

Which deployment strategy results in the highest additional cost during the update and rollback process?
In a Rolling deployment, why is the rollback speed considered "Moderate" rather than "Instant"?
True/False: A Traffic Splitting deployment requires you to maintain two sets of environment resources simultaneously.
What is the primary benefit of using Rolling with Additional Batch compared to a standard Rolling deployment during a failure scenario?

▶Click to see answers

Immutable Deployment (it doubles the capacity for a period).
Because the old version must be re-applied to the nodes in batches, just like the initial update was.
True. You need both the original and the new versions active to split traffic between them.
It maintains full capacity/throughput. A standard rolling deployment reduces the number of active nodes, which could lead to performance issues if the rollback takes a long time.