Study Guide: CI/CD Pipelines and ML Orchestration (MLA-C01)
Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines
Study Guide: CI/CD Pipelines and ML Orchestration
This guide covers Task 3.3 of the AWS Certified Machine Learning Engineer – Associate (MLA-C01) exam. It focuses on automating the lifecycle of machine learning models through robust CI/CD practices and orchestration tools.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between AWS CodePipeline, CodeBuild, and CodeDeploy capabilities.
- Select the appropriate orchestration tool (SageMaker Pipelines, Step Functions, or MWAA) for a specific ML workflow.
- Implement deployment strategies such as Blue/Green and Canary for ML endpoints.
- Configure automated retraining triggers using Amazon EventBridge and SageMaker.
- Apply Infrastructure as Code (IaC) principles using AWS CDK and CloudFormation to ML environments.
Key Terms & Glossary
- CI/CD: Continuous Integration (automated building/testing) and Continuous Delivery (automated release to repository) or Deployment (automated release to production).
- MLOps: The integration of DevOps principles into ML to ensure consistency, reproducibility, and reliability.
- DAG (Directed Acyclic Graph): A collection of all tasks you want to run, organized in a way that reflects their relationships and dependencies (used heavily in Apache Airflow).
- Blue/Green Deployment: A strategy where you have two identical production environments; only one (Blue) serves traffic while the other (Green) is updated and tested.
- Infrastructure as Code (IaC): Managing and provisioning infrastructure through machine-readable definition files (e.g., YAML, JSON, or Python via CDK) rather than manual console configuration.
The "Big Idea"
In traditional software, CI/CD focuses on code. In MLOps, CI/CD must account for a "triad" of changes: Code, Data, and Models. A change in any one of these three must trigger the pipeline to ensure the deployed model remains accurate and secure. Orchestration is the "glue" that ensures these complex, multi-step processes (data ingestion → preprocessing → training → evaluation → deployment) happen automatically and reliably.
Formula / Concept Box
| Pipeline Phase | Primary AWS Tool | Core Responsibility |
|---|---|---|
| Source | AWS CodeCommit / GitHub | Version control for code and configuration (IaC). |
| Build | AWS CodeBuild | Compiling code, running unit tests, and building Docker images. |
| Orchestrate | SageMaker Pipelines | Managing the ML-specific workflow steps (Training, Tuning). |
| Deploy | AWS CodeDeploy | Executing deployment strategies (Canary, Blue/Green) to endpoints. |
| Trigger | Amazon EventBridge | Scheduling or event-based execution (e.g., on S3 data upload). |
Hierarchical Outline
- Version Control & Repository Management
- Gitflow/GitHub Flow: Branching strategies for managing feature development and production releases.
- AWS CodeCommit: Managed source control service.
- AWS Developer Tools for CI/CD
- CodePipeline: The visual workflow manager that connects source, build, and deploy stages.
- CodeBuild: Serverless build service that scales to handle multiple builds concurrently.
- CodeDeploy: Automates code deployments to EC2, Lambda, or ECS/Fargate.
- ML Workflow Orchestrators
- SageMaker Pipelines: Purpose-built for ML; includes a Model Registry and lineage tracking.
- AWS Step Functions: Serverless state machines for general-purpose orchestration; great for cross-service coordination.
- Amazon MWAA: Managed Airflow for teams requiring Python-based DAG flexibility.
- Deployment & Rollback Strategies
- Blue/Green: Full swap of traffic between environments.
- Canary: Small percentage of traffic is shifted to the new model first to monitor for errors.
- Linear: Traffic is shifted in equal increments over a set period.
Visual Anchors
ML CI/CD Pipeline Flow
Blue/Green Traffic Shifting
\begin{tikzpicture} % Blue Environment \draw[blue, thick] (0,0) rectangle (2,1) node[midway] {Blue (V1)}; % Green Environment \draw[green!60!black, thick] (4,0) rectangle (6,1) node[midway] {Green (V2)}; % Traffic Splitter \draw (3,2.5) circle (0.3cm) node {\small Router}; % Traffic Arrows \draw[->, thick] (3,2.2) -- (1,1) node[midway, left] {10%}; \draw[->, thick] (3,2.2) -- (5,1) node[midway, right] {90%}; % Annotation \node at (3,-1) {\small Canary Deployment: Shifting traffic from Blue to Green}; \end{tikzpicture}
Definition-Example Pairs
- Continuous Deployment: Automatically deploying every change that passes the pipeline directly to production.
- Example: A retail site automatically updating its "Recommended for You" model every time a new training dataset is uploaded to S3 and performance exceeds the threshold.
- Model Registry: A central repository for managing model versions and their metadata.
- Example: A data scientist marks Version 4 of a "Churn Prediction" model as "Approved" in the SageMaker Model Registry, which automatically triggers a deployment to the staging environment.
- Automated Retraining: Using events to trigger a new training job.
- Example: An EventBridge rule detects a large drop in model accuracy via SageMaker Model Monitor and triggers a SageMaker Pipeline to retrain the model on the most recent 30 days of data.
Worked Examples
Scenario: Triggering an ML Pipeline on Data Arrival
Goal: Set up an automated workflow that starts a SageMaker Pipeline whenever a new CSV file is uploaded to an S3 bucket.
- Configure S3 Event Notifications: Set the S3 bucket to send a message to Amazon EventBridge when an
ObjectCreatedevent occurs. - Define EventBridge Rule: Create a rule that filters for the specific bucket and file suffix (e.g.,
.csv). - Set Target: Set the target of the EventBridge rule to be the SageMaker Pipeline ARN.
- Verification: Upload a test file. Check the SageMaker console under "Pipelines" to confirm a new execution has started.
[!TIP] Always use IAM roles with the "Principle of Least Privilege." The EventBridge service needs
sagemaker:StartPipelineExecutionpermissions for the specific pipeline.
Checkpoint Questions
- Which service is best for orchestrating a workflow that involves AWS Lambda, Amazon Glue, and SageMaker in a serverless state machine?
- What is the difference between a Canary deployment and a Blue/Green deployment?
- In AWS CodePipeline, what is the role of the "Artifact Store" (usually an S3 bucket)?
- How does AWS CDK differ from AWS CloudFormation for defining infrastructure?
▶Click to see answers
- AWS Step Functions.
- Blue/Green swaps all traffic (or most) at once after the green environment is ready. Canary shifts a tiny fraction (e.g., 5%) first to test "in the wild" before proceeding.
- It stores the output files (code, build results) from one stage so they can be used by the next stage.
- CloudFormation uses static YAML/JSON templates. CDK allows you to define infrastructure using familiar programming languages like Python or TypeScript, which then synthesizes into CloudFormation templates.
Muddy Points & Cross-Refs
- Step Functions vs. SageMaker Pipelines: Use SageMaker Pipelines if your workflow is 100% focused on ML steps and you want built-in model lineage. Use Step Functions if you need to coordinate non-ML services (like calling an external API or running complex Lambda logic) as part of the flow.
- CodeBuild vs. SageMaker Training Jobs: CodeBuild is for building software/images and running tests. SageMaker Training Jobs are for intensive mathematical model training on specialized GPU/CPU instances.
- Rollbacks: Always ensure your pipeline includes a "Manual Approval" gate before production if you aren't fully confident in your automated integration tests.
Comparison Tables
Orchestration Tools Comparison
| Feature | SageMaker Pipelines | AWS Step Functions | Amazon MWAA (Airflow) |
|---|---|---|---|
| Focus | Native ML Workflows | General App Integration | Complex Data Engineering |
| Language | Python (SageMaker SDK) | Amazon States Lang (JSON) | Python (DAGs) |
| Visualizer | SageMaker Studio | Step Functions Console | Airflow UI |
| Best For | Standardized ML Lifecycle | Event-driven microservices | Multi-cloud or complex dependencies |
Deployment Strategies
| Strategy | Downtime | Risk | Implementation Complexity |
|---|---|---|---|
| All-at-once | High | High | Low |
| Blue/Green | Zero | Low | Medium |
| Canary | Zero | Lowest | High |