Unit 3: Deployment and Orchestration of ML Workflows

This study guide covers the critical transition from model development to production environments. It focuses on choosing the right AWS infrastructure for inference, orchestrating complex workflows, and implementing robust CI/CD pipelines.

Learning Objectives

Select appropriate deployment infrastructure (Real-time, Serverless, Asynchronous, or Batch) based on latency and cost requirements.
Distinguish between AWS orchestration services including SageMaker Pipelines, AWS Step Functions, and Amazon MWAA.
Implement CI/CD principles for ML using AWS CodePipeline, CodeBuild, and CodeDeploy.
Evaluate deployment strategies such as Blue/Green and Canary to ensure high availability and safe rollbacks.

Key Terms & Glossary

Inference Pipeline: A linear sequence of containers (up to 15) that processes requests to provide predictions, often including preprocessing and post-processing steps.
SageMaker Neo: A service that optimizes ML models for deployment on edge devices (e.g., IoT) by compiling them to run faster with a smaller footprint.
DAG (Directed Acyclic Graph): A mathematical structure used in MWAA/Airflow to represent a workflow where tasks flow in one direction without loops.
Model Registry: A central repository within SageMaker to catalog models, manage versions, and track approval status.
Artifact: Any file or data generated during the pipeline, such as a trained model file (model.tar.gz) or a container image.

The "Big Idea"

Moving a model from a notebook to a production environment is not just about the code; it is about operationalizing the lifecycle. Effective deployment and orchestration transform a manual, fragile process into a repeatable, automated system that handles scale, ensures consistency across environments, and enables rapid iteration through safe deployment patterns.

Formula / Concept Box

Endpoint Type	Best For...	Key Characteristic
Real-time	Low latency, steady traffic	Persistent instances, sub-second response
Serverless	Intermittent/Spiky traffic	Automatic scaling, pay-per-use, cold starts possible
Asynchronous	Large payloads (up to 1GB)	Queued requests, processing up to 1 hour
Batch Transform	Non-real-time, offline data	Processes entire datasets, shuts down after completion

Hierarchical Outline

I. Inference Infrastructure
- Persistent Endpoints: Real-time for low latency; Asynchronous for long-running tasks.
- On-Demand Endpoints: Serverless for cost-optimization on intermittent traffic.
- Edge Deployment: Using SageMaker Neo and IoT Greengrass for local execution.
II. Orchestration Tools
- SageMaker Pipelines: Native ML orchestration; integrated with SageMaker Studio.
- AWS Step Functions: General-purpose serverless state machines; great for multi-service logic.
- Amazon MWAA: Managed Airflow for complex, Python-defined data engineering workflows.
III. Continuous Delivery (CI/CD)
- CodePipeline: The "glue" that automates the stages from Source to Deploy.
- Deployment Strategies:
  - Blue/Green: Full swap of traffic to a new environment.
  - Canary: Small percentage of traffic tested on the new version first.

Visual Anchors

ML Workflow Pipeline

Loading Diagram...

Blue/Green Deployment Strategy

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Rollback Strategy: A plan to revert to a previous stable version if the new deployment fails.
- Example: If a newly deployed model shows a 10% drop in accuracy in production, CodePipeline automatically triggers a rollback to the previous "Blue" environment.
Multi-Model Endpoint (MME): Hosting multiple models on a single serving container to save costs.
- Example: Hosting 50 different language translation models for small niche languages on one instance because each is used infrequently.
Shadow Deployment: Deploying a new model to production but only sending it a copy of live traffic without returning its results to users.
- Example: Running a new fraud detection algorithm alongside the old one to compare performance on live data without risking false denials.

Worked Examples

Scenario: High-Latency Audio Transcription

Problem: You need to deploy a model that transcribes 30-minute audio files. Requests come in throughout the day, but the transcription takes 5 minutes per file. Real-time endpoints time out at 60 seconds.

Solution Step-by-Step:

Analyze Constraints: Payload is large; processing time is long (> 60s); real-time is not viable.
Select Endpoint: Asynchronous Inference is the best fit.
Mechanism: The request is placed in an internal SQS queue. SageMaker processes it and stores the output in an S3 bucket.
Notification: Use an Amazon SNS topic to notify the application when the transcription is complete.

Checkpoint Questions

Which service is best for a team that wants to define their ML pipeline entirely in Python using Directed Acyclic Graphs (DAGs)?
What is the main difference between a Canary deployment and a Linear deployment?
When would you choose SageMaker Serverless Inference over Real-time Endpoints?
Which AWS service is used to compile a model for a specific hardware target like an Ambarella chipset?

Muddy Points & Cross-Refs

Step Functions vs. SageMaker Pipelines: Use SageMaker Pipelines if you are staying within the SageMaker ecosystem (Training, Processing, Tuning). Use Step Functions if your workflow involves non-ML services like AWS Glue or Lambda for complex business logic.
Inference Pipeline vs. SageMaker Pipeline: Don't confuse these! An Inference Pipeline is a sequence of containers inside a single endpoint for a single request. A SageMaker Pipeline is a series of steps (Training $\rightarrow$ Registration) to build the model.

Comparison Tables

Orchestration Tool Comparison

Feature	SageMaker Pipelines	AWS Step Functions	Amazon MWAA (Airflow)
Primary Goal	ML Lifecycle Automation	General App Workflows	Data/ETL Pipelines
Logic Definition	Python SDK / JSON	Amazon States Lang (JSON)	Python (DAGs)
Best Integration	SageMaker native	All AWS Services	Open-source ecosystem
Management	Serverless	Serverless	Managed Clusters

[!TIP] For the exam, remember: SageMaker Neo = Edge, Asynchronous = Long-running/Large payloads, and Blue/Green = Zero-downtime deployments.

Unit 3: Deployment and Orchestration of ML Workflows

Learning Objectives

Select appropriate deployment infrastructure (Real-time, Serverless, Asynchronous, or Batch) based on latency and cost requirements.
Distinguish between AWS orchestration services including SageMaker Pipelines, AWS Step Functions, and Amazon MWAA.
Implement CI/CD principles for ML using AWS CodePipeline, CodeBuild, and CodeDeploy.
Evaluate deployment strategies such as Blue/Green and Canary to ensure high availability and safe rollbacks.

Key Terms & Glossary

Inference Pipeline: A linear sequence of containers (up to 15) that processes requests to provide predictions, often including preprocessing and post-processing steps.
SageMaker Neo: A service that optimizes ML models for deployment on edge devices (e.g., IoT) by compiling them to run faster with a smaller footprint.
DAG (Directed Acyclic Graph): A mathematical structure used in MWAA/Airflow to represent a workflow where tasks flow in one direction without loops.
Model Registry: A central repository within SageMaker to catalog models, manage versions, and track approval status.
Artifact: Any file or data generated during the pipeline, such as a trained model file (model.tar.gz) or a container image.

The "Big Idea"

Formula / Concept Box

Endpoint Type	Best For...	Key Characteristic
Real-time	Low latency, steady traffic	Persistent instances, sub-second response
Serverless	Intermittent/Spiky traffic	Automatic scaling, pay-per-use, cold starts possible
Asynchronous	Large payloads (up to 1GB)	Queued requests, processing up to 1 hour
Batch Transform	Non-real-time, offline data	Processes entire datasets, shuts down after completion

Hierarchical Outline

I. Inference Infrastructure
- Persistent Endpoints: Real-time for low latency; Asynchronous for long-running tasks.
- On-Demand Endpoints: Serverless for cost-optimization on intermittent traffic.
- Edge Deployment: Using SageMaker Neo and IoT Greengrass for local execution.
II. Orchestration Tools
- SageMaker Pipelines: Native ML orchestration; integrated with SageMaker Studio.
- AWS Step Functions: General-purpose serverless state machines; great for multi-service logic.
- Amazon MWAA: Managed Airflow for complex, Python-defined data engineering workflows.
III. Continuous Delivery (CI/CD)
- CodePipeline: The "glue" that automates the stages from Source to Deploy.
- Deployment Strategies:
  - Blue/Green: Full swap of traffic to a new environment.
  - Canary: Small percentage of traffic tested on the new version first.

Visual Anchors

ML Workflow Pipeline

Loading Diagram...

Blue/Green Deployment Strategy

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Rollback Strategy: A plan to revert to a previous stable version if the new deployment fails.
- Example: If a newly deployed model shows a 10% drop in accuracy in production, CodePipeline automatically triggers a rollback to the previous "Blue" environment.
Multi-Model Endpoint (MME): Hosting multiple models on a single serving container to save costs.
- Example: Hosting 50 different language translation models for small niche languages on one instance because each is used infrequently.
Shadow Deployment: Deploying a new model to production but only sending it a copy of live traffic without returning its results to users.
- Example: Running a new fraud detection algorithm alongside the old one to compare performance on live data without risking false denials.

Worked Examples

Scenario: High-Latency Audio Transcription

Solution Step-by-Step:

Analyze Constraints: Payload is large; processing time is long (> 60s); real-time is not viable.
Select Endpoint: Asynchronous Inference is the best fit.
Mechanism: The request is placed in an internal SQS queue. SageMaker processes it and stores the output in an S3 bucket.
Notification: Use an Amazon SNS topic to notify the application when the transcription is complete.

Checkpoint Questions

Which service is best for a team that wants to define their ML pipeline entirely in Python using Directed Acyclic Graphs (DAGs)?
What is the main difference between a Canary deployment and a Linear deployment?
When would you choose SageMaker Serverless Inference over Real-time Endpoints?
Which AWS service is used to compile a model for a specific hardware target like an Ambarella chipset?

Muddy Points & Cross-Refs

Step Functions vs. SageMaker Pipelines: Use SageMaker Pipelines if you are staying within the SageMaker ecosystem (Training, Processing, Tuning). Use Step Functions if your workflow involves non-ML services like AWS Glue or Lambda for complex business logic.
Inference Pipeline vs. SageMaker Pipeline: Don't confuse these! An Inference Pipeline is a sequence of containers inside a single endpoint for a single request. A SageMaker Pipeline is a series of steps (Training $\rightarrow$ Registration) to build the model.

Comparison Tables

Orchestration Tool Comparison

Feature	SageMaker Pipelines	AWS Step Functions	Amazon MWAA (Airflow)
Primary Goal	ML Lifecycle Automation	General App Workflows	Data/ETL Pipelines
Logic Definition	Python SDK / JSON	Amazon States Lang (JSON)	Python (DAGs)
Best Integration	SageMaker native	All AWS Services	Open-source ecosystem
Management	Serverless	Serverless	Managed Clusters

[!TIP] For the exam, remember: SageMaker Neo = Edge, Asynchronous = Long-running/Large payloads, and Blue/Green = Zero-downtime deployments.

Unit 3: Deployment and Orchestration of ML Workflows - Study Guide

Unit 3: Deployment and Orchestration of ML Workflows

Learning Objectives

Key Terms & Glossary

The "Big Idea"

Formula / Concept Box

Hierarchical Outline

Visual Anchors

ML Workflow Pipeline

Blue/Green Deployment Strategy

Definition-Example Pairs

Worked Examples

Scenario: High-Latency Audio Transcription

Checkpoint Questions

Muddy Points & Cross-Refs

Comparison Tables

Orchestration Tool Comparison

Unit 3: Deployment and Orchestration of ML Workflows - Study Guide

Unit 3: Deployment and Orchestration of ML Workflows

Learning Objectives

Key Terms & Glossary

The "Big Idea"

Formula / Concept Box

Hierarchical Outline

Visual Anchors

ML Workflow Pipeline

Blue/Green Deployment Strategy

Definition-Example Pairs

Worked Examples

Scenario: High-Latency Audio Transcription

Checkpoint Questions

Muddy Points & Cross-Refs

Comparison Tables

Orchestration Tool Comparison