SageMaker Model Monitor: Detecting Data Drift in Production

Machine learning models often degrade in performance as the data they encounter in production diverges from their training data. This phenomenon, known as Data Drift, can lead to inaccurate predictions and business loss. In this lab, you will configure Amazon SageMaker Model Monitor to automatically detect these deviations by establishing a baseline and scheduling periodic quality checks.

[!WARNING] This lab involves provisioning AWS resources that may incur costs. Remember to run the teardown commands at the end to avoid ongoing charges.

Prerequisites

AWS Account: An active AWS account with Administrator access.
CLI Tools: AWS CLI configured with <YOUR_REGION> (e.g., us-east-1).
IAM Permissions: Ensure your execution role has AmazonSageMakerFullAccess and CloudWatchFullAccess.
S3 Bucket: A bucket named brainybee-lab-monitor-<YOUR_ACCOUNT_ID> to store baselines and captured data.

Learning Objectives

Enable Data Capture on an active SageMaker Inference Endpoint.
Execute a Baseline Job to generate statistical constraints from training data.
Configure a Monitoring Schedule to compare production traffic against the baseline.
Visualize Data Quality Metrics and drift alerts in Amazon CloudWatch.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Enable Data Capture on Endpoint

To monitor a model, we must first tell SageMaker to save a sample of the incoming requests and outgoing predictions to S3.

bash

# Define the Data Capture configuration
aws sagemaker update-endpoint-config \
    --endpoint-config-name "MyModelConfig" \
    --data-capture-config '{ "EnableCapture": true, "InitialSamplingPercentage": 100, "DestinationS3Uri": "s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID>/captured-data/", "CaptureOptions": [{"CaptureMode": "Input"}, {"CaptureMode": "Output"}] }'

▶Console alternative

Navigate to SageMaker > Inference > Endpoints.
Select your endpoint and click Update.
Under Data Capture, toggle to Enabled.
Set Sampling percentage to 100% and provide your S3 path.

Step 2: Establish a Baseline

Before we can detect drift, we need to know what "normal" looks like. We use a baseline job to analyze our training dataset.

bash

# Note: This is typically done via the SageMaker SDK in a notebook
# It triggers a processing job that creates statistics.json and constraints.json

[!TIP] The baseline job calculates the mean, standard deviation, and distribution for every feature in your dataset.

Step 3: Create a Monitoring Schedule

Now, schedule a recurring job (e.g., hourly) to compare the captured production data against the baseline.

bash

aws sagemaker create-monitoring-schedule \
    --monitoring-schedule-name "DailyDataQualityShift" \
    --monitoring-schedule-config '{ "MonitoringJobDefinitionName": "DataQualityJobDef", "MonitoringType": "DataQuality", "ScheduleConfig": { "ScheduleExpression": "cron(0 * * * ? *)" } }'

Checkpoints

S3 Verification: Run aws s3 ls s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID>/captured-data/ to ensure JSONL files are appearing after you send test traffic to the endpoint.
Status Check: Check the SageMaker console under Model Monitoring to ensure the schedule status is Scheduled or Executing.

Visualizing Drift

The following diagram illustrates the concept of a feature distribution shift that Model Monitor would flag:

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Troubleshooting

Issue	Likely Cause	Fix
No Data in S3	`DataCaptureConfig` not applied	Check endpoint configuration status in CLI.
Job Fails	IAM Role missing S3 permissions	Ensure the SageMaker execution role has `s3:PutObject` for your bucket.
Low Metrics	Low traffic volume	Send a burst of at least 50-100 requests to trigger a meaningful analysis.

Clean-Up / Teardown

Avoid charges by deleting the monitoring schedule and endpoint.

bash

# 1. Delete Monitoring Schedule
aws sagemaker delete-monitoring-schedule --monitoring-schedule-name "DailyDataQualityShift"

# 2. Delete Endpoint
aws sagemaker delete-endpoint --endpoint-name "MyModelEndpoint"

# 3. Empty S3 Bucket
aws s3 rm s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID> --recursive

Stretch Challenge

Model Quality Monitoring: Instead of just monitoring input data (Data Quality), set up a Model Quality monitor. This requires providing "Ground Truth" labels for the inferences made in Step 1 and comparing the model's accuracy/F1-score against the baseline performance.

Cost Estimate

Service	Usage Type	Estimated Cost
SageMaker Endpoint	ml.m5.xlarge (On-Demand)	~$0.23 / hour
Monitoring Job	ml.m5.xlarge (Processing)	~$0.23 / job run
S3 Storage	Data Logs	<$0.01 (negligible for lab)

Concept Review

Comparison Table: Monitoring Types

Feature	Data Quality	Model Quality	Bias Drift	Feature Attribution
What it monitors	Distribution of input features	Accuracy, Precision, Recall	Fairness metrics over time	Change in feature importance
Requirement	Baseline + Live Data	Baseline + Ground Truth	Baseline + Live Data	Baseline + Live Data
Detection	Outliers, Missing values	Accuracy drop	Prediction bias	Changing rank of features

SageMaker Model Monitor: Detecting Data Drift in Production

[!WARNING] This lab involves provisioning AWS resources that may incur costs. Remember to run the teardown commands at the end to avoid ongoing charges.

Prerequisites

AWS Account: An active AWS account with Administrator access.
CLI Tools: AWS CLI configured with <YOUR_REGION> (e.g., us-east-1).
IAM Permissions: Ensure your execution role has AmazonSageMakerFullAccess and CloudWatchFullAccess.
S3 Bucket: A bucket named brainybee-lab-monitor-<YOUR_ACCOUNT_ID> to store baselines and captured data.

Learning Objectives

Enable Data Capture on an active SageMaker Inference Endpoint.
Execute a Baseline Job to generate statistical constraints from training data.
Configure a Monitoring Schedule to compare production traffic against the baseline.
Visualize Data Quality Metrics and drift alerts in Amazon CloudWatch.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Enable Data Capture on Endpoint

To monitor a model, we must first tell SageMaker to save a sample of the incoming requests and outgoing predictions to S3.

bash

# Define the Data Capture configuration
aws sagemaker update-endpoint-config \
    --endpoint-config-name "MyModelConfig" \
    --data-capture-config '{ "EnableCapture": true, "InitialSamplingPercentage": 100, "DestinationS3Uri": "s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID>/captured-data/", "CaptureOptions": [{"CaptureMode": "Input"}, {"CaptureMode": "Output"}] }'

▶Console alternative

Navigate to SageMaker > Inference > Endpoints.
Select your endpoint and click Update.
Under Data Capture, toggle to Enabled.
Set Sampling percentage to 100% and provide your S3 path.

Step 2: Establish a Baseline

Before we can detect drift, we need to know what "normal" looks like. We use a baseline job to analyze our training dataset.

bash

# Note: This is typically done via the SageMaker SDK in a notebook
# It triggers a processing job that creates statistics.json and constraints.json

[!TIP] The baseline job calculates the mean, standard deviation, and distribution for every feature in your dataset.

Step 3: Create a Monitoring Schedule

Now, schedule a recurring job (e.g., hourly) to compare the captured production data against the baseline.

bash

aws sagemaker create-monitoring-schedule \
    --monitoring-schedule-name "DailyDataQualityShift" \
    --monitoring-schedule-config '{ "MonitoringJobDefinitionName": "DataQualityJobDef", "MonitoringType": "DataQuality", "ScheduleConfig": { "ScheduleExpression": "cron(0 * * * ? *)" } }'

Checkpoints

S3 Verification: Run aws s3 ls s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID>/captured-data/ to ensure JSONL files are appearing after you send test traffic to the endpoint.
Status Check: Check the SageMaker console under Model Monitoring to ensure the schedule status is Scheduled or Executing.

Visualizing Drift

The following diagram illustrates the concept of a feature distribution shift that Model Monitor would flag:

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Troubleshooting

Issue	Likely Cause	Fix
No Data in S3	`DataCaptureConfig` not applied	Check endpoint configuration status in CLI.
Job Fails	IAM Role missing S3 permissions	Ensure the SageMaker execution role has `s3:PutObject` for your bucket.
Low Metrics	Low traffic volume	Send a burst of at least 50-100 requests to trigger a meaningful analysis.

Clean-Up / Teardown

Avoid charges by deleting the monitoring schedule and endpoint.

bash

# 1. Delete Monitoring Schedule
aws sagemaker delete-monitoring-schedule --monitoring-schedule-name "DailyDataQualityShift"

# 2. Delete Endpoint
aws sagemaker delete-endpoint --endpoint-name "MyModelEndpoint"

# 3. Empty S3 Bucket
aws s3 rm s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID> --recursive

Stretch Challenge

Cost Estimate

Service	Usage Type	Estimated Cost
SageMaker Endpoint	ml.m5.xlarge (On-Demand)	~$0.23 / hour
Monitoring Job	ml.m5.xlarge (Processing)	~$0.23 / job run
S3 Storage	Data Logs	<$0.01 (negligible for lab)

Concept Review

Comparison Table: Monitoring Types

Feature	Data Quality	Model Quality	Bias Drift	Feature Attribution
What it monitors	Distribution of input features	Accuracy, Precision, Recall	Fairness metrics over time	Change in feature importance
Requirement	Baseline + Live Data	Baseline + Ground Truth	Baseline + Live Data	Baseline + Live Data
Detection	Outliers, Missing values	Accuracy drop	Prediction bias	Changing rank of features