Unit 3: Deployment and Orchestration of ML Workflows - Study Guide
Unit 3: Deployment and Orchestration of ML Workflows
Unit 3: Deployment and Orchestration of ML Workflows
This study guide covers the critical transition from model development to production environments. It focuses on choosing the right AWS infrastructure for inference, orchestrating complex workflows, and implementing robust CI/CD pipelines.
Learning Objectives
- Select appropriate deployment infrastructure (Real-time, Serverless, Asynchronous, or Batch) based on latency and cost requirements.
- Distinguish between AWS orchestration services including SageMaker Pipelines, AWS Step Functions, and Amazon MWAA.
- Implement CI/CD principles for ML using AWS CodePipeline, CodeBuild, and CodeDeploy.
- Evaluate deployment strategies such as Blue/Green and Canary to ensure high availability and safe rollbacks.
Key Terms & Glossary
- Inference Pipeline: A linear sequence of containers (up to 15) that processes requests to provide predictions, often including preprocessing and post-processing steps.
- SageMaker Neo: A service that optimizes ML models for deployment on edge devices (e.g., IoT) by compiling them to run faster with a smaller footprint.
- DAG (Directed Acyclic Graph): A mathematical structure used in MWAA/Airflow to represent a workflow where tasks flow in one direction without loops.
- Model Registry: A central repository within SageMaker to catalog models, manage versions, and track approval status.
- Artifact: Any file or data generated during the pipeline, such as a trained model file (
model.tar.gz) or a container image.
The "Big Idea"
Moving a model from a notebook to a production environment is not just about the code; it is about operationalizing the lifecycle. Effective deployment and orchestration transform a manual, fragile process into a repeatable, automated system that handles scale, ensures consistency across environments, and enables rapid iteration through safe deployment patterns.
Formula / Concept Box
| Endpoint Type | Best For... | Key Characteristic |
|---|---|---|
| Real-time | Low latency, steady traffic | Persistent instances, sub-second response |
| Serverless | Intermittent/Spiky traffic | Automatic scaling, pay-per-use, cold starts possible |
| Asynchronous | Large payloads (up to 1GB) | Queued requests, processing up to 1 hour |
| Batch Transform | Non-real-time, offline data | Processes entire datasets, shuts down after completion |
Hierarchical Outline
- I. Inference Infrastructure
- Persistent Endpoints: Real-time for low latency; Asynchronous for long-running tasks.
- On-Demand Endpoints: Serverless for cost-optimization on intermittent traffic.
- Edge Deployment: Using SageMaker Neo and IoT Greengrass for local execution.
- II. Orchestration Tools
- SageMaker Pipelines: Native ML orchestration; integrated with SageMaker Studio.
- AWS Step Functions: General-purpose serverless state machines; great for multi-service logic.
- Amazon MWAA: Managed Airflow for complex, Python-defined data engineering workflows.
- III. Continuous Delivery (CI/CD)
- CodePipeline: The "glue" that automates the stages from Source to Deploy.
- Deployment Strategies:
- Blue/Green: Full swap of traffic to a new environment.
- Canary: Small percentage of traffic tested on the new version first.
Visual Anchors
ML Workflow Pipeline
Blue/Green Deployment Strategy
\begin{tikzpicture}[node distance=2cm] \node (LB) [draw, rectangle, rounded corners, fill=gray!20] {Load Balancer}; \node (Blue) [draw, rectangle, fill=blue!30, below left of=LB, xshift=-1cm] {Blue (V1.0)}; \node (Green) [draw, rectangle, fill=green!30, below right of=LB, xshift=1cm] {Green (V1.1)}; \draw [->, thick] (LB) -- (Blue) node[midway, left] {100% (Old)}; \draw [->, dashed, thick] (LB) -- (Green) node[midway, right] {Swap to 100% (New)}; \draw [decoration={brace,mirror,raise=5pt},decorate] (Blue.south west) -- (Blue.south east) node [midway,below=10pt] {Production}; \draw [decoration={brace,mirror,raise=5pt},decorate] (Green.south west) -- (Green.south east) node [midway,below=10pt] {Staging Prod}; \end{tikzpicture}
Definition-Example Pairs
- Rollback Strategy: A plan to revert to a previous stable version if the new deployment fails.
- Example: If a newly deployed model shows a 10% drop in accuracy in production, CodePipeline automatically triggers a rollback to the previous "Blue" environment.
- Multi-Model Endpoint (MME): Hosting multiple models on a single serving container to save costs.
- Example: Hosting 50 different language translation models for small niche languages on one instance because each is used infrequently.
- Shadow Deployment: Deploying a new model to production but only sending it a copy of live traffic without returning its results to users.
- Example: Running a new fraud detection algorithm alongside the old one to compare performance on live data without risking false denials.
Worked Examples
Scenario: High-Latency Audio Transcription
Problem: You need to deploy a model that transcribes 30-minute audio files. Requests come in throughout the day, but the transcription takes 5 minutes per file. Real-time endpoints time out at 60 seconds.
Solution Step-by-Step:
- Analyze Constraints: Payload is large; processing time is long (> 60s); real-time is not viable.
- Select Endpoint: Asynchronous Inference is the best fit.
- Mechanism: The request is placed in an internal SQS queue. SageMaker processes it and stores the output in an S3 bucket.
- Notification: Use an Amazon SNS topic to notify the application when the transcription is complete.
Checkpoint Questions
- Which service is best for a team that wants to define their ML pipeline entirely in Python using Directed Acyclic Graphs (DAGs)?
- What is the main difference between a Canary deployment and a Linear deployment?
- When would you choose SageMaker Serverless Inference over Real-time Endpoints?
- Which AWS service is used to compile a model for a specific hardware target like an Ambarella chipset?
Muddy Points & Cross-Refs
- Step Functions vs. SageMaker Pipelines: Use SageMaker Pipelines if you are staying within the SageMaker ecosystem (Training, Processing, Tuning). Use Step Functions if your workflow involves non-ML services like AWS Glue or Lambda for complex business logic.
- Inference Pipeline vs. SageMaker Pipeline: Don't confuse these! An Inference Pipeline is a sequence of containers inside a single endpoint for a single request. A SageMaker Pipeline is a series of steps (Training Registration) to build the model.
Comparison Tables
Orchestration Tool Comparison
| Feature | SageMaker Pipelines | AWS Step Functions | Amazon MWAA (Airflow) |
|---|---|---|---|
| Primary Goal | ML Lifecycle Automation | General App Workflows | Data/ETL Pipelines |
| Logic Definition | Python SDK / JSON | Amazon States Lang (JSON) | Python (DAGs) |
| Best Integration | SageMaker native | All AWS Services | Open-source ecosystem |
| Management | Serverless | Serverless | Managed Clusters |
[!TIP] For the exam, remember: SageMaker Neo = Edge, Asynchronous = Long-running/Large payloads, and Blue/Green = Zero-downtime deployments.