Hands-On Lab850 words

Lab: Automating ML Workflows with AWS CodePipeline and SageMaker

Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines

Lab: Automating ML Workflows with AWS CodePipeline and SageMaker

This lab provides a hands-on experience in setting up a Continuous Integration and Continuous Delivery (CI/CD) pipeline for Machine Learning. You will use AWS CodePipeline to orchestrate a workflow that triggers a SageMaker Pipeline execution whenever new code is pushed.

[!WARNING] Remember to run the teardown commands at the end of the lab to avoid ongoing charges for AWS resources.

Prerequisites

Before starting this lab, ensure you have:

  • An AWS Account with administrative access.
  • AWS CLI installed and configured with <YOUR_ACCESS_KEY> and <YOUR_SECRET_KEY>.
  • Basic knowledge of Git and Python.
  • IAM permissions to create CodePipeline, CodeBuild, S3, and SageMaker resources.

Learning Objectives

By the end of this lab, you will be able to:

  1. Configure AWS CodePipeline to automate ML workflows.
  2. Use AWS CodeBuild to run unit tests on ML code.
  3. Trigger SageMaker Pipelines for model training and registration.
  4. Implement a basic CI/CD flow using AWS native tools.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an S3 Bucket for Artifacts

AWS CodePipeline requires an S3 bucket to store artifacts between stages.

bash
# Replace <UNIQUE_SUFFIX> with a random string aws s3 mb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --region <YOUR_REGION>
Console alternative
  1. Navigate to S3 in the AWS Console.
  2. Click Create bucket.
  3. Enter a name like brainybee-ml-artifacts-<UNIQUE_SUFFIX>.
  4. Leave other settings as default and click Create bucket.

Step 2: Prepare the Buildspec File

CodeBuild uses a buildspec.yml file to define the commands to run. Create a file named buildspec.yml in your local directory:

yaml
version: 0.2 phases: install: runtime-versions: python: 3.9 commands: - pip install sagemaker boto3 pytest build: commands: - echo "Running unit tests..." - pytest tests/ - echo "Triggering SageMaker Pipeline..." - python trigger_pipeline.py

Step 3: Create the CodeBuild Project

You need a project that will execute the logic defined in your buildspec.

bash
aws codebuild create-project \ --name "BrainyBee-ML-Build" \ --source '{"type": "NO_SOURCE"}' \ --artifacts '{"type": "NO_ARTIFACTS"}' \ --environment '{"type": "LINUX_CONTAINER", "image": "aws/codebuild/amazonlinux2-x86_64-standard:3.0", "computeType": "BUILD_GENERAL1_SMALL"}' \ --service-role "<YOUR_CODEBUILD_ROLE_ARN>"

[!NOTE] Ensure the service role provided has the AmazonSageMakerFullAccess policy attached to trigger pipelines.

Step 4: Configure the Orchestration Pipeline

Now, define the pipeline that connects your source to the build project.

Console Walkthrough (Recommended for Pipeline Setup)
  1. Navigate to CodePipeline > Create pipeline.
  2. Pipeline settings: Name it ML-Orchestration-Pipeline.
  3. Source stage: Choose S3 and select the bucket/object you created.
  4. Build stage: Choose AWS CodeBuild and select BrainyBee-ML-Build.
  5. Deploy stage: Skip this for now, as our "deploy" happens inside the Build stage via the SageMaker SDK.

Checkpoints

CheckpointActionExpected Result
S3 Checkaws s3 ls s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX>Bucket exists and is accessible.
CodeBuild CheckCheck CodeBuild console.The project BrainyBee-ML-Build is visible.
Pipeline CheckPush a new file to your source.CodePipeline status transitions from "In Progress" to "Succeeded".

Clean-Up / Teardown

To avoid ongoing charges, delete all resources created during this lab:

bash
# 1. Delete the Pipeline aws codepipeline delete-pipeline --name ML-Orchestration-Pipeline # 2. Delete the CodeBuild Project aws codebuild delete-project --name BrainyBee-ML-Build # 3. Empty and delete the S3 bucket aws s3 rb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --force

Troubleshooting

ErrorCauseFix
AccessDeniedIAM role lacks SageMaker permissions.Attach AmazonSageMakerFullAccess to the CodeBuild service role.
Build Failedpytest failed or buildspec.yml syntax error.Check CodeBuild logs in CloudWatch for specific tracebacks.
S3 Bucket Already ExistsS3 bucket names must be globally unique.Add a random suffix to your bucket name.

Stretch Challenge

Add a Manual Approval Step: Modify the pipeline to include a Manual Approval stage before the final deployment. Use the console to add a stage after the "Build" stage called "QA-Approval".

Cost Estimate

ServiceEstimated Cost (Monthly/Free Tier)
AWS CodePipelineFirst pipeline is free (within Free Tier), then $1.00 per active pipeline.
AWS CodeBuild100 build minutes (build.general1.small) free per month.
Amazon S3$0.023 per GB (Standard), first 5GB free.
SageMakerCosts vary by instance type; use ml.t3.medium for training to stay low cost.

Concept Review

ToolPrimary ML Use CaseComparison
SageMaker PipelinesML workflow orchestration (Preprocessing, Training, Registration).Built specifically for ML model lineage.
AWS CodePipelineApplication CI/CD orchestration.Better for managing code, testing, and multi-service deployment.
AWS Step FunctionsGeneric serverless workflow orchestration.Best for complex branching and event-driven architectures.

Why Orchestrate?

As ML workflows grow in complexity, manual management becomes impractical. Automated orchestration ensures:

  1. Repeatability: Every training run uses the same environment and logic.
  2. Versioning: Both code and models are tracked.
  3. Reliability: Automated tests catch errors before deployment.

\begin{tikzpicture} % Simple coordinate plot for learning curve \draw[->] (0,0) -- (4,0) node[right] {Time}; \draw[->] (0,0) -- (0,3) node[above] {Efficiency}; \draw[domain=0.5:3.5, smooth, variable=\x, blue, thick] plot (\x, {ln(\x)+1}); \node at (2,2.5) [blue] {Automated CI/CD}; \end{tikzpicture}

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free