Hands-On Lab: Navigating the ML Development Lifecycle and Governance on AWS — AWS Certified AI Practitioner (AIF-C01) Study Notes | BrainyBee

Prerequisites

Before starting this lab, ensure you have the following in place to successfully navigate the Machine Learning (ML) development lifecycle and governance steps:

AWS Account: Access to an AWS account with administrator or PowerUser access.
AWS CLI: Installed and configured (aws configure) with your access keys.
Permissions: IAM permissions to create S3 buckets and manage Amazon SageMaker resources.
Prior Knowledge: Basic understanding of ML terminology (e.g., training, inference, classification) and JSON file structures.

Learning Objectives

By completing this lab, you will be able to:

Map Business Goals to ML Services: Translate a business objective (e.g., churn reduction) into an AWS ML architecture.
Establish Data Processing Foundations: Provision secure cloud storage for ML datasets using Amazon S3.
Implement Model Governance: Create an ML Project tracking structure using the Amazon SageMaker Model Registry.
Execute Lifecycle Approvals: Transition a model from a "Pending" state to an "Approved" state for production readiness.

Architecture Overview

The following diagram illustrates the lifecycle workflow you will build. We will simulate the data preparation phase, register a model, and execute a governance approval step.

Loading Diagram...

Here is how this maps to the broader ML Development Lifecycle covered in the AWS Certified AI Practitioner framework:

Loading Diagram...

Step-by-Step Instructions

[!NOTE] Scenario: Your company wants to reduce customer churn by 15% (Business Goal). You have framed this as a binary classification problem (ML Problem Framing). Now, you need to set up the data processing layer and govern the model lifecycle.

Step 1: Set Up the Data Collection Environment

The first technical step in data processing is collecting and integrating data. We will create an S3 bucket to store our raw and preprocessed historical patient/customer records.

bash

# Define a unique bucket name using your account ID to ensure global uniqueness
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="brainybee-ml-data-${ACCOUNT_ID}"

# Create the S3 bucket
aws s3 mb s3://${BUCKET_NAME} --region us-east-1

▶Console alternative

Navigate to the S3 Console in AWS.
Click Create bucket.
Enter a globally unique bucket name (e.g., brainybee-ml-data-12345).
Leave default settings (Block all public access enabled) and click Create bucket.

📸 Screenshot: A successfully created S3 bucket in the AWS Management Console.

Step 2: Create a SageMaker Model Package Group

To manage the model lifecycle, we use the SageMaker Model Registry. This tracks model versions, documentation, and risk categories for audit readiness.

bash

# Create a logical group to hold versions of our churn prediction model
aws sagemaker create-model-package-group \
  --model-package-group-name "brainybee-churn-prediction" \
  --model-package-group-description "Customer Churn Classification Models"

▶Console alternative

Navigate to the Amazon SageMaker Console.
Under the left navigation pane, go to Models > Model registry.
Click Create model package group.
Name it brainybee-churn-prediction and provide a brief description.
Click Create.

[!TIP] In a real-world MLOps workflow, this step is often automated via SageMaker Pipelines upon the approval of an ML use case.

Step 3: Register a Model Version (Development Phase)

Once a model is trained, it must be registered for governance. We will simulate registering a trained model by pointing to a dummy container image.

First, create a JSON configuration file for the model inference specifications:

bash

cat <<EOF > inference_spec.json
{
  "InferenceSpecification": {
    "Containers": [
      {
        "Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.2-1"
      }
    ],
    "SupportedContentTypes": ["text/csv"],
    "SupportedResponseMIMETypes": ["text/csv"]
  }
}
EOF

Next, register the model with a "Pending" status to trigger the governance review process:

bash

aws sagemaker create-model-package \
  --model-package-group-name "brainybee-churn-prediction" \
  --model-package-description "XGBoost Churn Model V1" \
  --model-approval-status "PendingManualApproval" \
  --cli-input-json file://inference_spec.json

[!IMPORTANT] Take note of the ModelPackageArn output in your terminal. You will need it for the next step.

Step 4: Execute Governance Approval

Governance teams review model documentation, performance metrics, and risk impact. Upon approval, the model package version is updated to reflect its readiness for production.

bash

# Replace <YOUR_MODEL_PACKAGE_ARN> with the ARN output from Step 3
# Example ARN format: arn:aws:sagemaker:us-east-1:123456789012:model-package/brainybee-churn-prediction/1

aws sagemaker update-model-package \
  --model-package-arn "<YOUR_MODEL_PACKAGE_ARN>" \
  --model-approval-status "Approved"

▶Console alternative

In the SageMaker Console, navigate to Models > Model registry.
Click on the brainybee-churn-prediction group.
You will see Version 1 listed with a status of PendingManualApproval.
Click on the version number.
Select Update status, choose Approved, and add an optional comment (e.g., "Compliance checks passed").
Click Save and update.

Checkpoints

Run these verifications to ensure your lab steps were completed successfully:

Verify Data Storage:

bash
aws s3 ls | grep brainybee-ml-data

Expected output: Your bucket name should appear in the list.
Verify Model Registry Setup:

bash
aws sagemaker list-model-package-groups --name-contains "brainybee"

Expected output: JSON detailing the brainybee-churn-prediction package group.
Verify Governance Approval:

bash
aws sagemaker describe-model-package --model-package-name "<YOUR_MODEL_PACKAGE_ARN>" --query "ModelApprovalStatus"

Expected output: "Approved"

Clean-Up / Teardown

[!WARNING] Remember to run the teardown commands to avoid ongoing charges and to keep your AWS environment clean. While the Model Registry itself doesn't incur significant hourly costs, it's a best practice to remove unused resources.

Execute the following commands to delete all provisioned resources:

bash

# 1. Delete the Model Version
aws sagemaker delete-model-package --model-package-name "<YOUR_MODEL_PACKAGE_ARN>"

# 2. Delete the Model Package Group
aws sagemaker delete-model-package-group --model-package-group-name "brainybee-churn-prediction"

# 3. Delete the local JSON file
rm inference_spec.json

# 4. Delete the S3 bucket
aws s3 rb s3://${BUCKET_NAME} --force

Troubleshooting

Common Error	Cause	Fix
`BucketAlreadyExists`	S3 bucket names must be globally unique.	Ensure you appended your Account ID or random numbers to the bucket name.
`AccessDeniedException`	IAM user lacks permissions for SageMaker or S3.	Attach the `AmazonSageMakerFullAccess` and `AmazonS3FullAccess` policies to your IAM user/role.
`ValidationException: Could not find model package`	Incorrect or malformed ARN used in Step 4.	Copy the exact ARN string from the output of Step 3, ensuring no trailing spaces.
`ResourceNotFound`	Region mismatch between CLI config and requested resource.	Append `--region us-east-1` (or your chosen region) to the AWS CLI commands.

Concept Review

This lab walked you through bridging the gap between theoretical ML planning and technical AWS execution.

ML Lifecycle Phase	AWS Service Used	Purpose in this Lab
Data Processing	Amazon S3	Providing a secure, scalable landing zone for raw data and features before training begins.
Model Development	SageMaker Model Registry	Organizing iterations of built models. Keeping track of metadata, intended audience, and risk categories.
Deployment / Governance	SageMaker Model Registry (Status Update)	Enforcing a human-in-the-loop review process to ensure models meet regulatory and ethical compliance before production deployment.

Prerequisites

Before starting this lab, ensure you have the following in place to successfully navigate the Machine Learning (ML) development lifecycle and governance steps:

AWS Account: Access to an AWS account with administrator or PowerUser access.
AWS CLI: Installed and configured (aws configure) with your access keys.
Permissions: IAM permissions to create S3 buckets and manage Amazon SageMaker resources.
Prior Knowledge: Basic understanding of ML terminology (e.g., training, inference, classification) and JSON file structures.

Learning Objectives

By completing this lab, you will be able to:

Map Business Goals to ML Services: Translate a business objective (e.g., churn reduction) into an AWS ML architecture.
Establish Data Processing Foundations: Provision secure cloud storage for ML datasets using Amazon S3.
Implement Model Governance: Create an ML Project tracking structure using the Amazon SageMaker Model Registry.
Execute Lifecycle Approvals: Transition a model from a "Pending" state to an "Approved" state for production readiness.

Architecture Overview

The following diagram illustrates the lifecycle workflow you will build. We will simulate the data preparation phase, register a model, and execute a governance approval step.

Loading Diagram...

Here is how this maps to the broader ML Development Lifecycle covered in the AWS Certified AI Practitioner framework:

Loading Diagram...

Step-by-Step Instructions

[!NOTE] Scenario: Your company wants to reduce customer churn by 15% (Business Goal). You have framed this as a binary classification problem (ML Problem Framing). Now, you need to set up the data processing layer and govern the model lifecycle.

Step 1: Set Up the Data Collection Environment

The first technical step in data processing is collecting and integrating data. We will create an S3 bucket to store our raw and preprocessed historical patient/customer records.

bash

# Define a unique bucket name using your account ID to ensure global uniqueness
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="brainybee-ml-data-${ACCOUNT_ID}"

# Create the S3 bucket
aws s3 mb s3://${BUCKET_NAME} --region us-east-1

▶Console alternative

Navigate to the S3 Console in AWS.
Click Create bucket.
Enter a globally unique bucket name (e.g., brainybee-ml-data-12345).
Leave default settings (Block all public access enabled) and click Create bucket.

📸 Screenshot: A successfully created S3 bucket in the AWS Management Console.

Step 2: Create a SageMaker Model Package Group

To manage the model lifecycle, we use the SageMaker Model Registry. This tracks model versions, documentation, and risk categories for audit readiness.

bash

# Create a logical group to hold versions of our churn prediction model
aws sagemaker create-model-package-group \
  --model-package-group-name "brainybee-churn-prediction" \
  --model-package-group-description "Customer Churn Classification Models"

▶Console alternative

Navigate to the Amazon SageMaker Console.
Under the left navigation pane, go to Models > Model registry.
Click Create model package group.
Name it brainybee-churn-prediction and provide a brief description.
Click Create.

[!TIP] In a real-world MLOps workflow, this step is often automated via SageMaker Pipelines upon the approval of an ML use case.

Step 3: Register a Model Version (Development Phase)

Once a model is trained, it must be registered for governance. We will simulate registering a trained model by pointing to a dummy container image.

First, create a JSON configuration file for the model inference specifications:

bash

cat <<EOF > inference_spec.json
{
  "InferenceSpecification": {
    "Containers": [
      {
        "Image": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.2-1"
      }
    ],
    "SupportedContentTypes": ["text/csv"],
    "SupportedResponseMIMETypes": ["text/csv"]
  }
}
EOF

Next, register the model with a "Pending" status to trigger the governance review process:

bash

aws sagemaker create-model-package \
  --model-package-group-name "brainybee-churn-prediction" \
  --model-package-description "XGBoost Churn Model V1" \
  --model-approval-status "PendingManualApproval" \
  --cli-input-json file://inference_spec.json

[!IMPORTANT] Take note of the ModelPackageArn output in your terminal. You will need it for the next step.

Step 4: Execute Governance Approval

Governance teams review model documentation, performance metrics, and risk impact. Upon approval, the model package version is updated to reflect its readiness for production.

bash

# Replace <YOUR_MODEL_PACKAGE_ARN> with the ARN output from Step 3
# Example ARN format: arn:aws:sagemaker:us-east-1:123456789012:model-package/brainybee-churn-prediction/1

aws sagemaker update-model-package \
  --model-package-arn "<YOUR_MODEL_PACKAGE_ARN>" \
  --model-approval-status "Approved"

▶Console alternative

In the SageMaker Console, navigate to Models > Model registry.
Click on the brainybee-churn-prediction group.
You will see Version 1 listed with a status of PendingManualApproval.
Click on the version number.
Select Update status, choose Approved, and add an optional comment (e.g., "Compliance checks passed").
Click Save and update.

Checkpoints

Run these verifications to ensure your lab steps were completed successfully:

Verify Data Storage:

bash
aws s3 ls | grep brainybee-ml-data

Expected output: Your bucket name should appear in the list.
Verify Model Registry Setup:

bash
aws sagemaker list-model-package-groups --name-contains "brainybee"

Expected output: JSON detailing the brainybee-churn-prediction package group.
Verify Governance Approval:

bash
aws sagemaker describe-model-package --model-package-name "<YOUR_MODEL_PACKAGE_ARN>" --query "ModelApprovalStatus"

Expected output: "Approved"

Clean-Up / Teardown

[!WARNING] Remember to run the teardown commands to avoid ongoing charges and to keep your AWS environment clean. While the Model Registry itself doesn't incur significant hourly costs, it's a best practice to remove unused resources.

Execute the following commands to delete all provisioned resources:

bash

# 1. Delete the Model Version
aws sagemaker delete-model-package --model-package-name "<YOUR_MODEL_PACKAGE_ARN>"

# 2. Delete the Model Package Group
aws sagemaker delete-model-package-group --model-package-group-name "brainybee-churn-prediction"

# 3. Delete the local JSON file
rm inference_spec.json

# 4. Delete the S3 bucket
aws s3 rb s3://${BUCKET_NAME} --force

Troubleshooting

Common Error	Cause	Fix
`BucketAlreadyExists`	S3 bucket names must be globally unique.	Ensure you appended your Account ID or random numbers to the bucket name.
`AccessDeniedException`	IAM user lacks permissions for SageMaker or S3.	Attach the `AmazonSageMakerFullAccess` and `AmazonS3FullAccess` policies to your IAM user/role.
`ValidationException: Could not find model package`	Incorrect or malformed ARN used in Step 4.	Copy the exact ARN string from the output of Step 3, ensuring no trailing spaces.
`ResourceNotFound`	Region mismatch between CLI config and requested resource.	Append `--region us-east-1` (or your chosen region) to the AWS CLI commands.

Concept Review

This lab walked you through bridging the gap between theoretical ML planning and technical AWS execution.

ML Lifecycle Phase	AWS Service Used	Purpose in this Lab
Data Processing	Amazon S3	Providing a secure, scalable landing zone for raw data and features before training begins.
Model Development	SageMaker Model Registry	Organizing iterations of built models. Keeping track of metadata, intended audience, and risk categories.
Deployment / Governance	SageMaker Model Registry (Status Update)	Enforcing a human-in-the-loop review process to ensure models meet regulatory and ethical compliance before production deployment.