Lab: Automating ML Workflows with AWS CodePipeline and SageMaker
Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines
Lab: Automating ML Workflows with AWS CodePipeline and SageMaker
This lab provides a hands-on experience in setting up a Continuous Integration and Continuous Delivery (CI/CD) pipeline for Machine Learning. You will use AWS CodePipeline to orchestrate a workflow that triggers a SageMaker Pipeline execution whenever new code is pushed.
[!WARNING] Remember to run the teardown commands at the end of the lab to avoid ongoing charges for AWS resources.
Prerequisites
Before starting this lab, ensure you have:
- An AWS Account with administrative access.
- AWS CLI installed and configured with
<YOUR_ACCESS_KEY>and<YOUR_SECRET_KEY>. - Basic knowledge of Git and Python.
- IAM permissions to create CodePipeline, CodeBuild, S3, and SageMaker resources.
Learning Objectives
By the end of this lab, you will be able to:
- Configure AWS CodePipeline to automate ML workflows.
- Use AWS CodeBuild to run unit tests on ML code.
- Trigger SageMaker Pipelines for model training and registration.
- Implement a basic CI/CD flow using AWS native tools.
Architecture Overview
Step-by-Step Instructions
Step 1: Create an S3 Bucket for Artifacts
AWS CodePipeline requires an S3 bucket to store artifacts between stages.
# Replace <UNIQUE_SUFFIX> with a random string
aws s3 mb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --region <YOUR_REGION>▶Console alternative
- Navigate to S3 in the AWS Console.
- Click Create bucket.
- Enter a name like
brainybee-ml-artifacts-<UNIQUE_SUFFIX>. - Leave other settings as default and click Create bucket.
Step 2: Prepare the Buildspec File
CodeBuild uses a buildspec.yml file to define the commands to run. Create a file named buildspec.yml in your local directory:
version: 0.2
phases:
install:
runtime-versions:
python: 3.9
commands:
- pip install sagemaker boto3 pytest
build:
commands:
- echo "Running unit tests..."
- pytest tests/
- echo "Triggering SageMaker Pipeline..."
- python trigger_pipeline.pyStep 3: Create the CodeBuild Project
You need a project that will execute the logic defined in your buildspec.
aws codebuild create-project \
--name "BrainyBee-ML-Build" \
--source '{"type": "NO_SOURCE"}' \
--artifacts '{"type": "NO_ARTIFACTS"}' \
--environment '{"type": "LINUX_CONTAINER", "image": "aws/codebuild/amazonlinux2-x86_64-standard:3.0", "computeType": "BUILD_GENERAL1_SMALL"}' \
--service-role "<YOUR_CODEBUILD_ROLE_ARN>"[!NOTE] Ensure the service role provided has the
AmazonSageMakerFullAccesspolicy attached to trigger pipelines.
Step 4: Configure the Orchestration Pipeline
Now, define the pipeline that connects your source to the build project.
▶Console Walkthrough (Recommended for Pipeline Setup)
- Navigate to CodePipeline > Create pipeline.
- Pipeline settings: Name it
ML-Orchestration-Pipeline. - Source stage: Choose S3 and select the bucket/object you created.
- Build stage: Choose AWS CodeBuild and select
BrainyBee-ML-Build. - Deploy stage: Skip this for now, as our "deploy" happens inside the Build stage via the SageMaker SDK.
Checkpoints
| Checkpoint | Action | Expected Result |
|---|---|---|
| S3 Check | aws s3 ls s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> | Bucket exists and is accessible. |
| CodeBuild Check | Check CodeBuild console. | The project BrainyBee-ML-Build is visible. |
| Pipeline Check | Push a new file to your source. | CodePipeline status transitions from "In Progress" to "Succeeded". |
Clean-Up / Teardown
To avoid ongoing charges, delete all resources created during this lab:
# 1. Delete the Pipeline
aws codepipeline delete-pipeline --name ML-Orchestration-Pipeline
# 2. Delete the CodeBuild Project
aws codebuild delete-project --name BrainyBee-ML-Build
# 3. Empty and delete the S3 bucket
aws s3 rb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --forceTroubleshooting
| Error | Cause | Fix |
|---|---|---|
AccessDenied | IAM role lacks SageMaker permissions. | Attach AmazonSageMakerFullAccess to the CodeBuild service role. |
Build Failed | pytest failed or buildspec.yml syntax error. | Check CodeBuild logs in CloudWatch for specific tracebacks. |
S3 Bucket Already Exists | S3 bucket names must be globally unique. | Add a random suffix to your bucket name. |
Stretch Challenge
Add a Manual Approval Step: Modify the pipeline to include a Manual Approval stage before the final deployment. Use the console to add a stage after the "Build" stage called "QA-Approval".
Cost Estimate
| Service | Estimated Cost (Monthly/Free Tier) |
|---|---|
| AWS CodePipeline | First pipeline is free (within Free Tier), then $1.00 per active pipeline. |
| AWS CodeBuild | 100 build minutes (build.general1.small) free per month. |
| Amazon S3 | $0.023 per GB (Standard), first 5GB free. |
| SageMaker | Costs vary by instance type; use ml.t3.medium for training to stay low cost. |
Concept Review
| Tool | Primary ML Use Case | Comparison |
|---|---|---|
| SageMaker Pipelines | ML workflow orchestration (Preprocessing, Training, Registration). | Built specifically for ML model lineage. |
| AWS CodePipeline | Application CI/CD orchestration. | Better for managing code, testing, and multi-service deployment. |
| AWS Step Functions | Generic serverless workflow orchestration. | Best for complex branching and event-driven architectures. |
Why Orchestrate?
As ML workflows grow in complexity, manual management becomes impractical. Automated orchestration ensures:
- Repeatability: Every training run uses the same environment and logic.
- Versioning: Both code and models are tracked.
- Reliability: Automated tests catch errors before deployment.
\begin{tikzpicture} % Simple coordinate plot for learning curve \draw[->] (0,0) -- (4,0) node[right] {Time}; \draw[->] (0,0) -- (0,3) node[above] {Efficiency}; \draw[domain=0.5:3.5, smooth, variable=\x, blue, thick] plot (\x, {ln(\x)+1}); \node at (2,2.5) [blue] {Automated CI/CD}; \end{tikzpicture}