Lab: Analyzing Model Performance with Amazon SageMaker Clarify
Analyze model performance
Lab: Analyzing Model Performance with Amazon SageMaker Clarify
This lab provides hands-on experience in evaluating machine learning model performance using Amazon SageMaker. You will focus on interpreting key metrics, detecting model bias, and understanding model behavior using SageMaker Clarify.
Prerequisites
- An active AWS Account.
- IAM Permissions: Administrator access or
AmazonSageMakerFullAccessandAmazonS3FullAccesspolicies. - AWS CLI configured with your credentials.
- Familiarity with Python and basic Machine Learning concepts (Precision, Recall, F1 Score).
Learning Objectives
- Configure and run a SageMaker Clarify processing job to analyze model performance.
- Interpret classification metrics including Confusion Matrices, F1 Score, and AUC-ROC.
- Identify post-training bias across different data slices.
- Evaluate model explainability using SHAP (Lundberg and Lee) values.
Architecture Overview
Step-by-Step Instructions
Step 1: Prepare the S3 Environment
You need an S3 bucket to store the training data and the output from SageMaker Clarify.
# Create a unique bucket name
export BUCKET_NAME=brainybee-lab-ml-eval-<YOUR_ACCOUNT_ID>
aws s3 mb s3://$BUCKET_NAME --region <YOUR_REGION>▶Console alternative
Navigate to
. Name it
brainybee-lab-ml-eval-[your-id]and keep default settings.
Step 2: Configure the Model Performance Analysis
We will define a ModelConfig and AnalysisConfig for SageMaker Clarify. This configuration tells SageMaker which model to evaluate and which metrics to calculate.
[!NOTE] In a production scenario, you would point this to an existing Model Name in the SageMaker Model Registry.
# Create the analysis configuration file (analysis_config.json)
cat <<EOF > analysis_config.json
{
"methods": {
"report": {"name": "report", "title": "Model Performance Report"},
"shap": {"num_samples": 100},
"post_training_bias": {"methods": "all"}
},
"predictor": {
"model_name": "your-xgboost-model",
"instance_type": "ml.m5.xlarge",
"initial_instance_count": 1
}
}
EOFStep 3: Launch the Clarify Processing Job
Run the processing job to generate the evaluation metrics. This step calculates the Confusion Matrix and Precision-Recall curves.
aws sagemaker create-processing-job \n --processing-job-name "clarify-perf-analysis-$(date +%s)" \n --role-arn "<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>" \n --processing-resources '{"ClusterConfig": {"InstanceCount": 1, "InstanceType": "ml.m5.xlarge", "VolumeSizeInGB": 20}}' \n --app-specification '{"ImageUri": "<CLARIFY_IMAGE_URI>"}'[!TIP] The
<CLARIFY_IMAGE_URI>varies by region. Check the AWS documentation for the specific URI for SageMaker Clarify in your region.
Checkpoints
- Job Status Check: Run
aws sagemaker describe-processing-job --processing-job-name [your-job-name]and ensureProcessingJobStatusisCompleted. - Artifact Verification: Navigate to your S3 bucket. You should see a folder named
analysis_resultscontainingreport.pdfandanalysis.json.
Concept Review
Key Metrics for Model Evaluation
| Metric | Definition | Best Used For... |
|---|---|---|
| Accuracy | $(TP + TN) / Total | Balanced datasets. |
| Precision | TP / (TP + FP) | Minimizing False Positives (e.g., Spam detection). |
| Recall | TP / (TP + FN)$ | Minimizing False Negatives (e.g., Cancer diagnosis). |
| F1 Score | $2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$ | Imbalanced datasets; harmonic mean of P & R. |
Visualizing the ROC Curve
The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR).
Troubleshooting
| Error | Likely Cause | Fix |
|---|---|---|
AccessDenied | IAM role lacks S3 permissions. | Attach AmazonS3FullAccess to the execution role. |
ResourceLimitExceeded | Too many active instances. | Check Service Quotas for ml.m5.xlarge processing jobs. |
InvalidConfig | Syntax error in JSON config. | Use a JSON validator to ensure analysis_config.json is well-formed. |
Stretch Challenge
Scenario: Your model is performing well on average, but you suspect it is underperforming for a specific demographic (e.g., users in a specific postal_code).
Task: Modify your analysis_config.json to include a group_variable under post_training_bias to calculate the Difference in Proportions of Labels (DPL) for that specific feature.
Cost Estimate
- SageMaker Processing: $0.23 per hour (for
ml.m5.xlargein us-east-1). - S3 Storage: Negligible for this lab (< $0.01).
- Total Estimated Cost: < $0.50 (if teardown is completed).
Clean-Up / Teardown
[!WARNING] Failure to delete S3 objects and processing configurations can lead to small recurring storage costs.
# Delete the analysis configuration from S3
aws s3 rm s3://$BUCKET_NAME/analysis_results --recursive
# Delete the bucket (only if empty)
aws s3 rb s3://$BUCKET_NAMEEnsure you stop any SageMaker Studio kernels or Notebook Instances used to trigger these jobs.