Lab: Analyzing Model Performance with Amazon SageMaker Clarify

This lab provides hands-on experience in evaluating machine learning model performance using Amazon SageMaker. You will focus on interpreting key metrics, detecting model bias, and understanding model behavior using SageMaker Clarify.

Prerequisites

An active AWS Account.
IAM Permissions: Administrator access or AmazonSageMakerFullAccess and AmazonS3FullAccess policies.
AWS CLI configured with your credentials.
Familiarity with Python and basic Machine Learning concepts (Precision, Recall, F1 Score).

Learning Objectives

Configure and run a SageMaker Clarify processing job to analyze model performance.
Interpret classification metrics including Confusion Matrices, F1 Score, and AUC-ROC.
Identify post-training bias across different data slices.
Evaluate model explainability using SHAP (Lundberg and Lee) values.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Prepare the S3 Environment

You need an S3 bucket to store the training data and the output from SageMaker Clarify.

bash

# Create a unique bucket name
export BUCKET_NAME=brainybee-lab-ml-eval-<YOUR_ACCOUNT_ID>
aws s3 mb s3://$BUCKET_NAME --region <YOUR_REGION>

▶Console alternative

Navigate to

Create bucket

. Name it

brainybee-lab-ml-eval-[your-id]

and keep default settings.

Step 2: Configure the Model Performance Analysis

We will define a ModelConfig and AnalysisConfig for SageMaker Clarify. This configuration tells SageMaker which model to evaluate and which metrics to calculate.

[!NOTE] In a production scenario, you would point this to an existing Model Name in the SageMaker Model Registry.

bash

# Create the analysis configuration file (analysis_config.json)
cat <<EOF > analysis_config.json
{
    "methods": {
        "report": {"name": "report", "title": "Model Performance Report"},
        "shap": {"num_samples": 100},
        "post_training_bias": {"methods": "all"}
    },
    "predictor": {
        "model_name": "your-xgboost-model",
        "instance_type": "ml.m5.xlarge",
        "initial_instance_count": 1
    }
}
EOF

Step 3: Launch the Clarify Processing Job

Run the processing job to generate the evaluation metrics. This step calculates the Confusion Matrix and Precision-Recall curves.

bash

aws sagemaker create-processing-job \n    --processing-job-name "clarify-perf-analysis-$(date +%s)" \n    --role-arn "<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>" \n    --processing-resources '{"ClusterConfig": {"InstanceCount": 1, "InstanceType": "ml.m5.xlarge", "VolumeSizeInGB": 20}}' \n    --app-specification '{"ImageUri": "<CLARIFY_IMAGE_URI>"}'

[!TIP] The <CLARIFY_IMAGE_URI> varies by region. Check the AWS documentation for the specific URI for SageMaker Clarify in your region.

Checkpoints

Job Status Check: Run aws sagemaker describe-processing-job --processing-job-name [your-job-name] and ensure ProcessingJobStatus is Completed.
Artifact Verification: Navigate to your S3 bucket. You should see a folder named analysis_results containing report.pdf and analysis.json.

Concept Review

Key Metrics for Model Evaluation

Metric	Definition	Best Used For...
Accuracy	$(TP + TN) / Total$	Balanced datasets.
Precision	$TP / (TP + FP)$	Minimizing False Positives (e.g., Spam detection).
Recall	$TP / (TP + FN)$	Minimizing False Negatives (e.g., Cancer diagnosis).
F1 Score	$$2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$$	Imbalanced datasets; harmonic mean of P & R.

Visualizing the ROC Curve

The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR).

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Troubleshooting

Error	Likely Cause	Fix
`AccessDenied`	IAM role lacks S3 permissions.	Attach `AmazonS3FullAccess` to the execution role.
`ResourceLimitExceeded`	Too many active instances.	Check Service Quotas for `ml.m5.xlarge` processing jobs.
`InvalidConfig`	Syntax error in JSON config.	Use a JSON validator to ensure `analysis_config.json` is well-formed.

Stretch Challenge

Scenario: Your model is performing well on average, but you suspect it is underperforming for a specific demographic (e.g., users in a specific postal_code).

Task: Modify your analysis_config.json to include a group_variable under post_training_bias to calculate the Difference in Proportions of Labels (DPL) for that specific feature.

Cost Estimate

SageMaker Processing: $0.23 per hour (for ml.m5.xlarge in us-east-1).
S3 Storage: Negligible for this lab (< $0.01).
Total Estimated Cost: < $0.50 (if teardown is completed).

Clean-Up / Teardown

[!WARNING] Failure to delete S3 objects and processing configurations can lead to small recurring storage costs.

bash

# Delete the analysis configuration from S3
aws s3 rm s3://$BUCKET_NAME/analysis_results --recursive

# Delete the bucket (only if empty)
aws s3 rb s3://$BUCKET_NAME

Ensure you stop any SageMaker Studio kernels or Notebook Instances used to trigger these jobs.

Lab: Analyzing Model Performance with Amazon SageMaker Clarify

Prerequisites

An active AWS Account.
IAM Permissions: Administrator access or AmazonSageMakerFullAccess and AmazonS3FullAccess policies.
AWS CLI configured with your credentials.
Familiarity with Python and basic Machine Learning concepts (Precision, Recall, F1 Score).

Learning Objectives

Configure and run a SageMaker Clarify processing job to analyze model performance.
Interpret classification metrics including Confusion Matrices, F1 Score, and AUC-ROC.
Identify post-training bias across different data slices.
Evaluate model explainability using SHAP (Lundberg and Lee) values.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Prepare the S3 Environment

You need an S3 bucket to store the training data and the output from SageMaker Clarify.

bash

# Create a unique bucket name
export BUCKET_NAME=brainybee-lab-ml-eval-<YOUR_ACCOUNT_ID>
aws s3 mb s3://$BUCKET_NAME --region <YOUR_REGION>

▶Console alternative

Navigate to

Create bucket

. Name it

brainybee-lab-ml-eval-[your-id]

and keep default settings.

Step 2: Configure the Model Performance Analysis

We will define a ModelConfig and AnalysisConfig for SageMaker Clarify. This configuration tells SageMaker which model to evaluate and which metrics to calculate.

[!NOTE] In a production scenario, you would point this to an existing Model Name in the SageMaker Model Registry.

bash

# Create the analysis configuration file (analysis_config.json)
cat <<EOF > analysis_config.json
{
    "methods": {
        "report": {"name": "report", "title": "Model Performance Report"},
        "shap": {"num_samples": 100},
        "post_training_bias": {"methods": "all"}
    },
    "predictor": {
        "model_name": "your-xgboost-model",
        "instance_type": "ml.m5.xlarge",
        "initial_instance_count": 1
    }
}
EOF

Step 3: Launch the Clarify Processing Job

Run the processing job to generate the evaluation metrics. This step calculates the Confusion Matrix and Precision-Recall curves.

bash

aws sagemaker create-processing-job \n    --processing-job-name "clarify-perf-analysis-$(date +%s)" \n    --role-arn "<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>" \n    --processing-resources '{"ClusterConfig": {"InstanceCount": 1, "InstanceType": "ml.m5.xlarge", "VolumeSizeInGB": 20}}' \n    --app-specification '{"ImageUri": "<CLARIFY_IMAGE_URI>"}'

[!TIP] The <CLARIFY_IMAGE_URI> varies by region. Check the AWS documentation for the specific URI for SageMaker Clarify in your region.

Checkpoints

Job Status Check: Run aws sagemaker describe-processing-job --processing-job-name [your-job-name] and ensure ProcessingJobStatus is Completed.
Artifact Verification: Navigate to your S3 bucket. You should see a folder named analysis_results containing report.pdf and analysis.json.

Concept Review

Key Metrics for Model Evaluation

Metric	Definition	Best Used For...
Accuracy	$(TP + TN) / Total$	Balanced datasets.
Precision	$TP / (TP + FP)$	Minimizing False Positives (e.g., Spam detection).
Recall	$TP / (TP + FN)$	Minimizing False Negatives (e.g., Cancer diagnosis).
F1 Score	$$2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$$	Imbalanced datasets; harmonic mean of P & R.

Visualizing the ROC Curve

The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR).

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Troubleshooting

Error	Likely Cause	Fix
`AccessDenied`	IAM role lacks S3 permissions.	Attach `AmazonS3FullAccess` to the execution role.
`ResourceLimitExceeded`	Too many active instances.	Check Service Quotas for `ml.m5.xlarge` processing jobs.
`InvalidConfig`	Syntax error in JSON config.	Use a JSON validator to ensure `analysis_config.json` is well-formed.

Stretch Challenge

Scenario: Your model is performing well on average, but you suspect it is underperforming for a specific demographic (e.g., users in a specific postal_code).

Task: Modify your analysis_config.json to include a group_variable under post_training_bias to calculate the Difference in Proportions of Labels (DPL) for that specific feature.

Cost Estimate

SageMaker Processing: $0.23 per hour (for ml.m5.xlarge in us-east-1).
S3 Storage: Negligible for this lab (< $0.01).
Total Estimated Cost: < $0.50 (if teardown is completed).

Clean-Up / Teardown

[!WARNING] Failure to delete S3 objects and processing configurations can lead to small recurring storage costs.

bash

# Delete the analysis configuration from S3
aws s3 rm s3://$BUCKET_NAME/analysis_results --recursive

# Delete the bucket (only if empty)
aws s3 rb s3://$BUCKET_NAME

Ensure you stop any SageMaker Studio kernels or Notebook Instances used to trigger these jobs.