Monitoring Model Performance and Data Distribution Shifts

This study guide focuses on the critical task of ensuring machine learning models remain accurate and fair after deployment. You will learn to identify different types of drift and leverage AWS services like SageMaker Model Monitor and SageMaker Clarify to maintain model integrity.

Learning Objectives

By the end of this guide, you should be able to:

Distinguish between Data Drift, Concept Drift, and Bias Drift.
Configure SageMaker Model Monitor to detect deviations from training baselines.
Use SageMaker Clarify to identify pre-training and post-training bias.
Interpret Feature Attribution Drift to understand changing model logic.
Integrate monitoring with Amazon CloudWatch for automated alerting.

Key Terms & Glossary

Data Drift: A change in the statistical distribution of input data (features) over time.
Concept Drift: A change in the relationship between input features and the target variable (the "concept" the model learned has changed).
Feature Attribution: A technique (often using SHAP values) that assigns a score to each feature based on how much it influenced a specific prediction.
Baseline: A reference dataset (usually the training set) used to calculate statistics that production data is compared against.
Facet: A specific attribute or dimension of the data (e.g., age, gender, or region) used to analyze bias.

The "Big Idea"

Machine learning models are not "set and forget" assets. They are built on a snapshot of the world (training data). As the real world evolves—due to changing consumer behavior, economic shifts, or new sensor hardware—the model's assumptions become outdated. Monitoring is the immune system of an ML system, detecting "infections" like bias or drift before they impact business outcomes.

Formula / Concept Box

Metric	Description	Purpose
Class Imbalance (CI)	Measures the difference in the number of samples between facets.	Pre-training bias detection.
Difference in Proportions of Labels (DPL)	Measures the difference in the rate of positive outcomes between facets.	Post-training bias detection.
SHAP (Shapley Additive Explanations)	Mathematically attributes the contribution of each feature to a prediction.	Explainability & Attribution Drift.

Hierarchical Outline

I. Understanding Drift Types
- A. Data Drift (Covariate Shift): Features change (e.g., users get younger).
- B. Concept Drift: Labels change for the same features (e.g., what was "spam" is now "ham").
- C. Bias Drift: The model becomes less fair over time due to data shifts.
II. Amazon SageMaker Model Monitor
- A. Data Quality: Compares production feature statistics to training baselines.
- B. Model Quality: Compares predictions to actual ground-truth labels.
- C. Bias/Explainability: Integrates with Clarify to monitor fairness and attribution.
III. Amazon SageMaker Clarify
- A. Pre-training: Analyzing the dataset before training for imbalances.
- B. Post-training: Explaining model behavior and detecting prediction bias.

Visual Anchors

Monitoring Lifecycle

Loading Diagram...

Visualizing Data Drift (Distribution Shift)

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Feature Attribution Drift: When the importance of features shifts significantly between the training environment and production.
- Example: In a housing price model, "Square Footage" was the top predictor during training, but in production, "Proximity to Public Transit" suddenly becomes the dominant factor due to a gas price hike.
Pre-Training Bias: Bias inherent in the raw data before the model ever sees it.
- Example: A hiring dataset containing 90% male applicants for engineering roles, leading to a model that unfairly penalizes female candidates (Class Imbalance).

Worked Examples

Scenario: Setting up a Drift Check for a Credit Scoring Model

Baseline: Run a SageMaker Model Monitor baseline job on your training S3 path. This generates a statistics.json and constraints.json file.
Deployment: Deploy the model to a Production Variant on a SageMaker Endpoint with DataCaptureConfig enabled.
Monitoring Schedule: Create a MonitoringSchedule that runs hourly. It will automatically grab the last hour of captured S3 data.
Analysis: Model Monitor compares the hourly data against the constraints.json. If the "mean income" in production shifts by more than 2 standard deviations from the baseline, it logs a violation.
Action: An Amazon CloudWatch Alarm triggers an AWS Lambda function to start a new training job with the most recent data.

Checkpoint Questions

Which service would you use to find out why a specific insurance claim was denied? (Answer: SageMaker Clarify via SHAP values).
What is the difference between Data Quality and Model Quality monitoring? (Answer: Data Quality checks inputs; Model Quality checks predictions against actual outcomes).
True or False: Feature Attribution Drift can be detected without ground-truth labels. (Answer: True; you only need the model's predictions and inputs to calculate attribution).

Muddy Points & Cross-Refs

[!TIP] Data Drift vs. Concept Drift: It is easy to confuse these.

Data Drift is about the input (The weather is getting hotter).

Concept Drift is about the logic (People used to buy coats when it was 50°F, but now they only buy them at 40°F).

Comparison Tables

Feature	SageMaker Model Monitor	SageMaker Clarify
Primary Goal	Operational health & drift detection.	Fairness, bias detection, & explainability.
Timing	Continuous monitoring of live endpoints.	Pre-training, Post-training, and Monitoring.
Input Required	Data captures from S3.	Data, Models, and Facet definitions.
Key Metric	Mean, StDev, Null count.	CI, DPL, SHAP values.