Monitoring Model Performance and Data Distribution Shifts
Detecting changes in the distribution of data that can affect model performance (for example, by using SageMaker Clarify)
Monitoring Model Performance and Data Distribution Shifts
This study guide focuses on the critical task of ensuring machine learning models remain accurate and fair after deployment. You will learn to identify different types of drift and leverage AWS services like SageMaker Model Monitor and SageMaker Clarify to maintain model integrity.
Learning Objectives
By the end of this guide, you should be able to:
- Distinguish between Data Drift, Concept Drift, and Bias Drift.
- Configure SageMaker Model Monitor to detect deviations from training baselines.
- Use SageMaker Clarify to identify pre-training and post-training bias.
- Interpret Feature Attribution Drift to understand changing model logic.
- Integrate monitoring with Amazon CloudWatch for automated alerting.
Key Terms & Glossary
- Data Drift: A change in the statistical distribution of input data (features) over time.
- Concept Drift: A change in the relationship between input features and the target variable (the "concept" the model learned has changed).
- Feature Attribution: A technique (often using SHAP values) that assigns a score to each feature based on how much it influenced a specific prediction.
- Baseline: A reference dataset (usually the training set) used to calculate statistics that production data is compared against.
- Facet: A specific attribute or dimension of the data (e.g., age, gender, or region) used to analyze bias.
The "Big Idea"
Machine learning models are not "set and forget" assets. They are built on a snapshot of the world (training data). As the real world evolves—due to changing consumer behavior, economic shifts, or new sensor hardware—the model's assumptions become outdated. Monitoring is the immune system of an ML system, detecting "infections" like bias or drift before they impact business outcomes.
Formula / Concept Box
| Metric | Description | Purpose |
|---|---|---|
| Class Imbalance (CI) | Measures the difference in the number of samples between facets. | Pre-training bias detection. |
| Difference in Proportions of Labels (DPL) | Measures the difference in the rate of positive outcomes between facets. | Post-training bias detection. |
| SHAP (Shapley Additive Explanations) | Mathematically attributes the contribution of each feature to a prediction. | Explainability & Attribution Drift. |
Hierarchical Outline
- I. Understanding Drift Types
- A. Data Drift (Covariate Shift): Features change (e.g., users get younger).
- B. Concept Drift: Labels change for the same features (e.g., what was "spam" is now "ham").
- C. Bias Drift: The model becomes less fair over time due to data shifts.
- II. Amazon SageMaker Model Monitor
- A. Data Quality: Compares production feature statistics to training baselines.
- B. Model Quality: Compares predictions to actual ground-truth labels.
- C. Bias/Explainability: Integrates with Clarify to monitor fairness and attribution.
- III. Amazon SageMaker Clarify
- A. Pre-training: Analyzing the dataset before training for imbalances.
- B. Post-training: Explaining model behavior and detecting prediction bias.
Visual Anchors
Monitoring Lifecycle
Visualizing Data Drift (Distribution Shift)
Definition-Example Pairs
- Feature Attribution Drift: When the importance of features shifts significantly between the training environment and production.
- Example: In a housing price model, "Square Footage" was the top predictor during training, but in production, "Proximity to Public Transit" suddenly becomes the dominant factor due to a gas price hike.
- Pre-Training Bias: Bias inherent in the raw data before the model ever sees it.
- Example: A hiring dataset containing 90% male applicants for engineering roles, leading to a model that unfairly penalizes female candidates (Class Imbalance).
Worked Examples
Scenario: Setting up a Drift Check for a Credit Scoring Model
- Baseline: Run a SageMaker Model Monitor baseline job on your training S3 path. This generates a
statistics.jsonandconstraints.jsonfile. - Deployment: Deploy the model to a Production Variant on a SageMaker Endpoint with
DataCaptureConfigenabled. - Monitoring Schedule: Create a
MonitoringSchedulethat runs hourly. It will automatically grab the last hour of captured S3 data. - Analysis: Model Monitor compares the hourly data against the
constraints.json. If the "mean income" in production shifts by more than 2 standard deviations from the baseline, it logs a violation. - Action: An Amazon CloudWatch Alarm triggers an AWS Lambda function to start a new training job with the most recent data.
Checkpoint Questions
- Which service would you use to find out why a specific insurance claim was denied? (Answer: SageMaker Clarify via SHAP values).
- What is the difference between Data Quality and Model Quality monitoring? (Answer: Data Quality checks inputs; Model Quality checks predictions against actual outcomes).
- True or False: Feature Attribution Drift can be detected without ground-truth labels. (Answer: True; you only need the model's predictions and inputs to calculate attribution).
Muddy Points & Cross-Refs
[!TIP] Data Drift vs. Concept Drift: It is easy to confuse these.
- Data Drift is about the input (The weather is getting hotter).
- Concept Drift is about the logic (People used to buy coats when it was 50°F, but now they only buy them at 40°F).
Comparison Tables
| Feature | SageMaker Model Monitor | SageMaker Clarify |
|---|---|---|
| Primary Goal | Operational health & drift detection. | Fairness, bias detection, & explainability. |
| Timing | Continuous monitoring of live endpoints. | Pre-training, Post-training, and Monitoring. |
| Input Required | Data captures from S3. | Data, Models, and Facet definitions. |
| Key Metric | Mean, StDev, Null count. | CI, DPL, SHAP values. |