Study Guide920 words

Unit 3: Deployment and Orchestration of ML Workflows - Study Guide

Unit 3: Deployment and Orchestration of ML Workflows

Unit 3: Deployment and Orchestration of ML Workflows

This study guide covers the critical transition from model development to production environments. It focuses on choosing the right AWS infrastructure for inference, orchestrating complex workflows, and implementing robust CI/CD pipelines.

Learning Objectives

  • Select appropriate deployment infrastructure (Real-time, Serverless, Asynchronous, or Batch) based on latency and cost requirements.
  • Distinguish between AWS orchestration services including SageMaker Pipelines, AWS Step Functions, and Amazon MWAA.
  • Implement CI/CD principles for ML using AWS CodePipeline, CodeBuild, and CodeDeploy.
  • Evaluate deployment strategies such as Blue/Green and Canary to ensure high availability and safe rollbacks.

Key Terms & Glossary

  • Inference Pipeline: A linear sequence of containers (up to 15) that processes requests to provide predictions, often including preprocessing and post-processing steps.
  • SageMaker Neo: A service that optimizes ML models for deployment on edge devices (e.g., IoT) by compiling them to run faster with a smaller footprint.
  • DAG (Directed Acyclic Graph): A mathematical structure used in MWAA/Airflow to represent a workflow where tasks flow in one direction without loops.
  • Model Registry: A central repository within SageMaker to catalog models, manage versions, and track approval status.
  • Artifact: Any file or data generated during the pipeline, such as a trained model file (model.tar.gz) or a container image.

The "Big Idea"

Moving a model from a notebook to a production environment is not just about the code; it is about operationalizing the lifecycle. Effective deployment and orchestration transform a manual, fragile process into a repeatable, automated system that handles scale, ensures consistency across environments, and enables rapid iteration through safe deployment patterns.

Formula / Concept Box

Endpoint TypeBest For...Key Characteristic
Real-timeLow latency, steady trafficPersistent instances, sub-second response
ServerlessIntermittent/Spiky trafficAutomatic scaling, pay-per-use, cold starts possible
AsynchronousLarge payloads (up to 1GB)Queued requests, processing up to 1 hour
Batch TransformNon-real-time, offline dataProcesses entire datasets, shuts down after completion

Hierarchical Outline

  • I. Inference Infrastructure
    • Persistent Endpoints: Real-time for low latency; Asynchronous for long-running tasks.
    • On-Demand Endpoints: Serverless for cost-optimization on intermittent traffic.
    • Edge Deployment: Using SageMaker Neo and IoT Greengrass for local execution.
  • II. Orchestration Tools
    • SageMaker Pipelines: Native ML orchestration; integrated with SageMaker Studio.
    • AWS Step Functions: General-purpose serverless state machines; great for multi-service logic.
    • Amazon MWAA: Managed Airflow for complex, Python-defined data engineering workflows.
  • III. Continuous Delivery (CI/CD)
    • CodePipeline: The "glue" that automates the stages from Source to Deploy.
    • Deployment Strategies:
      • Blue/Green: Full swap of traffic to a new environment.
      • Canary: Small percentage of traffic tested on the new version first.

Visual Anchors

ML Workflow Pipeline

Loading Diagram...

Blue/Green Deployment Strategy

\begin{tikzpicture}[node distance=2cm] \node (LB) [draw, rectangle, rounded corners, fill=gray!20] {Load Balancer}; \node (Blue) [draw, rectangle, fill=blue!30, below left of=LB, xshift=-1cm] {Blue (V1.0)}; \node (Green) [draw, rectangle, fill=green!30, below right of=LB, xshift=1cm] {Green (V1.1)}; \draw [->, thick] (LB) -- (Blue) node[midway, left] {100% (Old)}; \draw [->, dashed, thick] (LB) -- (Green) node[midway, right] {Swap to 100% (New)}; \draw [decoration={brace,mirror,raise=5pt},decorate] (Blue.south west) -- (Blue.south east) node [midway,below=10pt] {Production}; \draw [decoration={brace,mirror,raise=5pt},decorate] (Green.south west) -- (Green.south east) node [midway,below=10pt] {Staging \rightarrow Prod}; \end{tikzpicture}

Definition-Example Pairs

  • Rollback Strategy: A plan to revert to a previous stable version if the new deployment fails.
    • Example: If a newly deployed model shows a 10% drop in accuracy in production, CodePipeline automatically triggers a rollback to the previous "Blue" environment.
  • Multi-Model Endpoint (MME): Hosting multiple models on a single serving container to save costs.
    • Example: Hosting 50 different language translation models for small niche languages on one instance because each is used infrequently.
  • Shadow Deployment: Deploying a new model to production but only sending it a copy of live traffic without returning its results to users.
    • Example: Running a new fraud detection algorithm alongside the old one to compare performance on live data without risking false denials.

Worked Examples

Scenario: High-Latency Audio Transcription

Problem: You need to deploy a model that transcribes 30-minute audio files. Requests come in throughout the day, but the transcription takes 5 minutes per file. Real-time endpoints time out at 60 seconds.

Solution Step-by-Step:

  1. Analyze Constraints: Payload is large; processing time is long (> 60s); real-time is not viable.
  2. Select Endpoint: Asynchronous Inference is the best fit.
  3. Mechanism: The request is placed in an internal SQS queue. SageMaker processes it and stores the output in an S3 bucket.
  4. Notification: Use an Amazon SNS topic to notify the application when the transcription is complete.

Checkpoint Questions

  1. Which service is best for a team that wants to define their ML pipeline entirely in Python using Directed Acyclic Graphs (DAGs)?
  2. What is the main difference between a Canary deployment and a Linear deployment?
  3. When would you choose SageMaker Serverless Inference over Real-time Endpoints?
  4. Which AWS service is used to compile a model for a specific hardware target like an Ambarella chipset?

Muddy Points & Cross-Refs

  • Step Functions vs. SageMaker Pipelines: Use SageMaker Pipelines if you are staying within the SageMaker ecosystem (Training, Processing, Tuning). Use Step Functions if your workflow involves non-ML services like AWS Glue or Lambda for complex business logic.
  • Inference Pipeline vs. SageMaker Pipeline: Don't confuse these! An Inference Pipeline is a sequence of containers inside a single endpoint for a single request. A SageMaker Pipeline is a series of steps (Training \rightarrow Registration) to build the model.

Comparison Tables

Orchestration Tool Comparison

FeatureSageMaker PipelinesAWS Step FunctionsAmazon MWAA (Airflow)
Primary GoalML Lifecycle AutomationGeneral App WorkflowsData/ETL Pipelines
Logic DefinitionPython SDK / JSONAmazon States Lang (JSON)Python (DAGs)
Best IntegrationSageMaker nativeAll AWS ServicesOpen-source ecosystem
ManagementServerlessServerlessManaged Clusters

[!TIP] For the exam, remember: SageMaker Neo = Edge, Asynchronous = Long-running/Large payloads, and Blue/Green = Zero-downtime deployments.

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free