Hands-On Lab925 words

SageMaker Model Monitor: Detecting Data Drift in Production

Monitor model inference

SageMaker Model Monitor: Detecting Data Drift in Production

Machine learning models often degrade in performance as the data they encounter in production diverges from their training data. This phenomenon, known as Data Drift, can lead to inaccurate predictions and business loss. In this lab, you will configure Amazon SageMaker Model Monitor to automatically detect these deviations by establishing a baseline and scheduling periodic quality checks.

[!WARNING] This lab involves provisioning AWS resources that may incur costs. Remember to run the teardown commands at the end to avoid ongoing charges.

Prerequisites

  • AWS Account: An active AWS account with Administrator access.
  • CLI Tools: AWS CLI configured with <YOUR_REGION> (e.g., us-east-1).
  • IAM Permissions: Ensure your execution role has AmazonSageMakerFullAccess and CloudWatchFullAccess.
  • S3 Bucket: A bucket named brainybee-lab-monitor-<YOUR_ACCOUNT_ID> to store baselines and captured data.

Learning Objectives

  • Enable Data Capture on an active SageMaker Inference Endpoint.
  • Execute a Baseline Job to generate statistical constraints from training data.
  • Configure a Monitoring Schedule to compare production traffic against the baseline.
  • Visualize Data Quality Metrics and drift alerts in Amazon CloudWatch.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Enable Data Capture on Endpoint

To monitor a model, we must first tell SageMaker to save a sample of the incoming requests and outgoing predictions to S3.

bash
# Define the Data Capture configuration aws sagemaker update-endpoint-config \ --endpoint-config-name "MyModelConfig" \ --data-capture-config '{ "EnableCapture": true, "InitialSamplingPercentage": 100, "DestinationS3Uri": "s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID>/captured-data/", "CaptureOptions": [{"CaptureMode": "Input"}, {"CaptureMode": "Output"}] }'
Console alternative
  1. Navigate to SageMaker > Inference > Endpoints.
  2. Select your endpoint and click Update.
  3. Under Data Capture, toggle to Enabled.
  4. Set Sampling percentage to 100% and provide your S3 path.

Step 2: Establish a Baseline

Before we can detect drift, we need to know what "normal" looks like. We use a baseline job to analyze our training dataset.

bash
# Note: This is typically done via the SageMaker SDK in a notebook # It triggers a processing job that creates statistics.json and constraints.json

[!TIP] The baseline job calculates the mean, standard deviation, and distribution for every feature in your dataset.

Step 3: Create a Monitoring Schedule

Now, schedule a recurring job (e.g., hourly) to compare the captured production data against the baseline.

bash
aws sagemaker create-monitoring-schedule \ --monitoring-schedule-name "DailyDataQualityShift" \ --monitoring-schedule-config '{ "MonitoringJobDefinitionName": "DataQualityJobDef", "MonitoringType": "DataQuality", "ScheduleConfig": { "ScheduleExpression": "cron(0 * * * ? *)" } }'

Checkpoints

  • S3 Verification: Run aws s3 ls s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID>/captured-data/ to ensure JSONL files are appearing after you send test traffic to the endpoint.
  • Status Check: Check the SageMaker console under Model Monitoring to ensure the schedule status is Scheduled or Executing.

Visualizing Drift

The following diagram illustrates the concept of a feature distribution shift that Model Monitor would flag:

\begin{tikzpicture}[scale=0.8] % Baseline Distribution \draw[blue, thick] (-3,0) .. controls (-1,0) and (-0.5,3) .. (0,3) .. controls (0.5,3) and (1,0) .. (3,0); \node[blue] at (0,3.3) {Baseline (Training)};

code
% Drifted Distribution \draw[red, dashed, thick] (-1,0) .. controls (1,0) and (1.5,3) .. (2,3) .. controls (2.5,3) and (3,0) .. (5,0); \node[red] at (2,3.3) {Production (Drifted)}; % Axes \draw[->] (-4,0) -- (6,0) node[right] {Feature Value}; \draw[->] (-4,0) -- (-4,4) node[above] {Density}; % Indicator \draw[<->, thick] (0,1.5) -- (2,1.5); \node at (1,1.8) {\bf{DRIFT}};

\end{tikzpicture}

Troubleshooting

IssueLikely CauseFix
No Data in S3DataCaptureConfig not appliedCheck endpoint configuration status in CLI.
Job FailsIAM Role missing S3 permissionsEnsure the SageMaker execution role has s3:PutObject for your bucket.
Low MetricsLow traffic volumeSend a burst of at least 50-100 requests to trigger a meaningful analysis.

Clean-Up / Teardown

Avoid charges by deleting the monitoring schedule and endpoint.

bash
# 1. Delete Monitoring Schedule aws sagemaker delete-monitoring-schedule --monitoring-schedule-name "DailyDataQualityShift" # 2. Delete Endpoint aws sagemaker delete-endpoint --endpoint-name "MyModelEndpoint" # 3. Empty S3 Bucket aws s3 rm s3://brainybee-lab-monitor-<YOUR_ACCOUNT_ID> --recursive

Stretch Challenge

Model Quality Monitoring: Instead of just monitoring input data (Data Quality), set up a Model Quality monitor. This requires providing "Ground Truth" labels for the inferences made in Step 1 and comparing the model's accuracy/F1-score against the baseline performance.

Cost Estimate

ServiceUsage TypeEstimated Cost
SageMaker Endpointml.m5.xlarge (On-Demand)~$0.23 / hour
Monitoring Jobml.m5.xlarge (Processing)~$0.23 / job run
S3 StorageData Logs<$0.01 (negligible for lab)

Concept Review

Comparison Table: Monitoring Types

FeatureData QualityModel QualityBias DriftFeature Attribution
What it monitorsDistribution of input featuresAccuracy, Precision, RecallFairness metrics over timeChange in feature importance
RequirementBaseline + Live DataBaseline + Ground TruthBaseline + Live DataBaseline + Live Data
DetectionOutliers, Missing valuesAccuracy dropPrediction biasChanging rank of features

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free