Lab: Automating ML Workflows with AWS CodePipeline and SageMaker

This lab provides a hands-on experience in setting up a Continuous Integration and Continuous Delivery (CI/CD) pipeline for Machine Learning. You will use AWS CodePipeline to orchestrate a workflow that triggers a SageMaker Pipeline execution whenever new code is pushed.

[!WARNING] Remember to run the teardown commands at the end of the lab to avoid ongoing charges for AWS resources.

Prerequisites

Before starting this lab, ensure you have:

An AWS Account with administrative access.
AWS CLI installed and configured with <YOUR_ACCESS_KEY> and <YOUR_SECRET_KEY>.
Basic knowledge of Git and Python.
IAM permissions to create CodePipeline, CodeBuild, S3, and SageMaker resources.

Learning Objectives

By the end of this lab, you will be able to:

Configure AWS CodePipeline to automate ML workflows.
Use AWS CodeBuild to run unit tests on ML code.
Trigger SageMaker Pipelines for model training and registration.
Implement a basic CI/CD flow using AWS native tools.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an S3 Bucket for Artifacts

AWS CodePipeline requires an S3 bucket to store artifacts between stages.

bash

# Replace <UNIQUE_SUFFIX> with a random string
aws s3 mb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --region <YOUR_REGION>

▶Console alternative

Navigate to S3 in the AWS Console.
Click Create bucket.
Enter a name like brainybee-ml-artifacts-<UNIQUE_SUFFIX>.
Leave other settings as default and click Create bucket.

Step 2: Prepare the Buildspec File

CodeBuild uses a buildspec.yml file to define the commands to run. Create a file named buildspec.yml in your local directory:

yaml

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.9
    commands:
      - pip install sagemaker boto3 pytest
  build:
    commands:
      - echo "Running unit tests..."
      - pytest tests/
      - echo "Triggering SageMaker Pipeline..."
      - python trigger_pipeline.py

Step 3: Create the CodeBuild Project

You need a project that will execute the logic defined in your buildspec.

bash

aws codebuild create-project \
    --name "BrainyBee-ML-Build" \
    --source '{"type": "NO_SOURCE"}' \
    --artifacts '{"type": "NO_ARTIFACTS"}' \
    --environment '{"type": "LINUX_CONTAINER", "image": "aws/codebuild/amazonlinux2-x86_64-standard:3.0", "computeType": "BUILD_GENERAL1_SMALL"}' \
    --service-role "<YOUR_CODEBUILD_ROLE_ARN>"

[!NOTE] Ensure the service role provided has the AmazonSageMakerFullAccess policy attached to trigger pipelines.

Step 4: Configure the Orchestration Pipeline

Now, define the pipeline that connects your source to the build project.

▶Console Walkthrough (Recommended for Pipeline Setup)

Navigate to CodePipeline > Create pipeline.
Pipeline settings: Name it ML-Orchestration-Pipeline.
Source stage: Choose S3 and select the bucket/object you created.
Build stage: Choose AWS CodeBuild and select BrainyBee-ML-Build.
Deploy stage: Skip this for now, as our "deploy" happens inside the Build stage via the SageMaker SDK.

Checkpoints

Checkpoint	Action	Expected Result
S3 Check	`aws s3 ls s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX>`	Bucket exists and is accessible.
CodeBuild Check	Check CodeBuild console.	The project `BrainyBee-ML-Build` is visible.
Pipeline Check	Push a new file to your source.	CodePipeline status transitions from "In Progress" to "Succeeded".

Clean-Up / Teardown

To avoid ongoing charges, delete all resources created during this lab:

bash

# 1. Delete the Pipeline
aws codepipeline delete-pipeline --name ML-Orchestration-Pipeline

# 2. Delete the CodeBuild Project
aws codebuild delete-project --name BrainyBee-ML-Build

# 3. Empty and delete the S3 bucket
aws s3 rb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --force

Troubleshooting

Error	Cause	Fix
`AccessDenied`	IAM role lacks SageMaker permissions.	Attach `AmazonSageMakerFullAccess` to the CodeBuild service role.
`Build Failed`	`pytest` failed or `buildspec.yml` syntax error.	Check CodeBuild logs in CloudWatch for specific tracebacks.
`S3 Bucket Already Exists`	S3 bucket names must be globally unique.	Add a random suffix to your bucket name.

Stretch Challenge

Add a Manual Approval Step: Modify the pipeline to include a Manual Approval stage before the final deployment. Use the console to add a stage after the "Build" stage called "QA-Approval".

Cost Estimate

Service	Estimated Cost (Monthly/Free Tier)
AWS CodePipeline	First pipeline is free (within Free Tier), then $1.00 per active pipeline.
AWS CodeBuild	100 build minutes (build.general1.small) free per month.
Amazon S3	$0.023 per GB (Standard), first 5GB free.
SageMaker	Costs vary by instance type; use `ml.t3.medium` for training to stay low cost.

Concept Review

Tool	Primary ML Use Case	Comparison
SageMaker Pipelines	ML workflow orchestration (Preprocessing, Training, Registration).	Built specifically for ML model lineage.
AWS CodePipeline	Application CI/CD orchestration.	Better for managing code, testing, and multi-service deployment.
AWS Step Functions	Generic serverless workflow orchestration.	Best for complex branching and event-driven architectures.

Why Orchestrate?

As ML workflows grow in complexity, manual management becomes impractical. Automated orchestration ensures:

Repeatability: Every training run uses the same environment and logic.
Versioning: Both code and models are tracked.
Reliability: Automated tests catch errors before deployment.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Lab: Automating ML Workflows with AWS CodePipeline and SageMaker

[!WARNING] Remember to run the teardown commands at the end of the lab to avoid ongoing charges for AWS resources.

Prerequisites

Before starting this lab, ensure you have:

An AWS Account with administrative access.
AWS CLI installed and configured with <YOUR_ACCESS_KEY> and <YOUR_SECRET_KEY>.
Basic knowledge of Git and Python.
IAM permissions to create CodePipeline, CodeBuild, S3, and SageMaker resources.

Learning Objectives

By the end of this lab, you will be able to:

Configure AWS CodePipeline to automate ML workflows.
Use AWS CodeBuild to run unit tests on ML code.
Trigger SageMaker Pipelines for model training and registration.
Implement a basic CI/CD flow using AWS native tools.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an S3 Bucket for Artifacts

AWS CodePipeline requires an S3 bucket to store artifacts between stages.

bash

# Replace <UNIQUE_SUFFIX> with a random string
aws s3 mb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --region <YOUR_REGION>

▶Console alternative

Navigate to S3 in the AWS Console.
Click Create bucket.
Enter a name like brainybee-ml-artifacts-<UNIQUE_SUFFIX>.
Leave other settings as default and click Create bucket.

Step 2: Prepare the Buildspec File

CodeBuild uses a buildspec.yml file to define the commands to run. Create a file named buildspec.yml in your local directory:

yaml

version: 0.2

phases:
  install:
    runtime-versions:
      python: 3.9
    commands:
      - pip install sagemaker boto3 pytest
  build:
    commands:
      - echo "Running unit tests..."
      - pytest tests/
      - echo "Triggering SageMaker Pipeline..."
      - python trigger_pipeline.py

Step 3: Create the CodeBuild Project

You need a project that will execute the logic defined in your buildspec.

bash

aws codebuild create-project \
    --name "BrainyBee-ML-Build" \
    --source '{"type": "NO_SOURCE"}' \
    --artifacts '{"type": "NO_ARTIFACTS"}' \
    --environment '{"type": "LINUX_CONTAINER", "image": "aws/codebuild/amazonlinux2-x86_64-standard:3.0", "computeType": "BUILD_GENERAL1_SMALL"}' \
    --service-role "<YOUR_CODEBUILD_ROLE_ARN>"

[!NOTE] Ensure the service role provided has the AmazonSageMakerFullAccess policy attached to trigger pipelines.

Step 4: Configure the Orchestration Pipeline

Now, define the pipeline that connects your source to the build project.

▶Console Walkthrough (Recommended for Pipeline Setup)

Navigate to CodePipeline > Create pipeline.
Pipeline settings: Name it ML-Orchestration-Pipeline.
Source stage: Choose S3 and select the bucket/object you created.
Build stage: Choose AWS CodeBuild and select BrainyBee-ML-Build.
Deploy stage: Skip this for now, as our "deploy" happens inside the Build stage via the SageMaker SDK.

Checkpoints

Checkpoint	Action	Expected Result
S3 Check	`aws s3 ls s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX>`	Bucket exists and is accessible.
CodeBuild Check	Check CodeBuild console.	The project `BrainyBee-ML-Build` is visible.
Pipeline Check	Push a new file to your source.	CodePipeline status transitions from "In Progress" to "Succeeded".

Clean-Up / Teardown

To avoid ongoing charges, delete all resources created during this lab:

bash

# 1. Delete the Pipeline
aws codepipeline delete-pipeline --name ML-Orchestration-Pipeline

# 2. Delete the CodeBuild Project
aws codebuild delete-project --name BrainyBee-ML-Build

# 3. Empty and delete the S3 bucket
aws s3 rb s3://brainybee-ml-artifacts-<UNIQUE_SUFFIX> --force

Troubleshooting

Error	Cause	Fix
`AccessDenied`	IAM role lacks SageMaker permissions.	Attach `AmazonSageMakerFullAccess` to the CodeBuild service role.
`Build Failed`	`pytest` failed or `buildspec.yml` syntax error.	Check CodeBuild logs in CloudWatch for specific tracebacks.
`S3 Bucket Already Exists`	S3 bucket names must be globally unique.	Add a random suffix to your bucket name.

Stretch Challenge

Add a Manual Approval Step: Modify the pipeline to include a Manual Approval stage before the final deployment. Use the console to add a stage after the "Build" stage called "QA-Approval".

Cost Estimate

Service	Estimated Cost (Monthly/Free Tier)
AWS CodePipeline	First pipeline is free (within Free Tier), then $1.00 per active pipeline.
AWS CodeBuild	100 build minutes (build.general1.small) free per month.
Amazon S3	$0.023 per GB (Standard), first 5GB free.
SageMaker	Costs vary by instance type; use `ml.t3.medium` for training to stay low cost.

Concept Review

Tool	Primary ML Use Case	Comparison
SageMaker Pipelines	ML workflow orchestration (Preprocessing, Training, Registration).	Built specifically for ML model lineage.
AWS CodePipeline	Application CI/CD orchestration.	Better for managing code, testing, and multi-service deployment.
AWS Step Functions	Generic serverless workflow orchestration.	Best for complex branching and event-driven architectures.

Why Orchestrate?

As ML workflows grow in complexity, manual management becomes impractical. Automated orchestration ensures:

Repeatability: Every training run uses the same environment and logic.
Versioning: Both code and models are tracked.
Reliability: Automated tests catch errors before deployment.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds