Lab: Deploying and Scaling ML Infrastructure on AWS

This hands-on lab guides you through the process of selecting, deploying, and configuring auto-scaling for machine learning infrastructure on AWS, specifically focusing on Amazon SageMaker AI endpoints. You will learn to navigate the tradeoffs between different deployment targets and implement infrastructure-as-code principles.

Prerequisites

To successfully complete this lab, you need:

An AWS Account with AdministratorAccess permissions.
AWS CLI installed and configured on your local machine with access to <YOUR_REGION> (e.g., us-east-1).
Python 3.9+ and the sagemaker and boto3 libraries installed (pip install sagemaker boto3).
Basic familiarity with the AWS Management Console.

Learning Objectives

By the end of this lab, you will be able to:

Select the appropriate deployment target based on latency and cost requirements.
Deploy a pre-trained model to a SageMaker Real-Time Endpoint using the Python SDK.
Configure Auto Scaling policies to handle fluctuating traffic patterns.
Verify infrastructure scaling using CloudWatch metrics and CLI commands.

Architecture Overview

The following diagram illustrates the infrastructure components you will provision:

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an IAM Execution Role

SageMaker requires an IAM role to access S3 buckets and CloudWatch logs on your behalf.

bash

# Create the trust policy file
echo '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "sagemaker.amazonaws.com"},"Action": "sts:AssumeRole"}]}' > trust-policy.json

# Create the role
aws iam create-role --role-name BrainyBee-SageMaker-Role --assume-role-policy-document file://trust-policy.json

# Attach the SageMaker Full Access policy
aws iam attach-role-policy --role-name BrainyBee-SageMaker-Role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

▶Console alternative

Navigate to IAM > Roles > Create Role. Select AWS Service and SageMaker. Attach the AmazonSageMakerFullAccess policy and name it BrainyBee-SageMaker-Role.

Step 2: Deploy a Model to a Real-Time Endpoint

We will use a sample pre-trained model provided by AWS to create a Real-Time endpoint.

python

import sagemaker
from sagemaker.image_uris import retrieve

session = sagemaker.Session()
role = "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/BrainyBee-SageMaker-Role"
region = session.boto_region_name

# Retrieve the container URI for XGBoost
container = retrieve("xgboost", region, "1.5-1")

# Define the model
model = sagemaker.model.Model(
    image_uri=container,
    model_data="s3://sagemaker-sample-files/datasets/tabular/uci_statlog_german_credit_data/model/xgboost-model.tar.gz",
    role=role
)

# Deploy the endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium",
    endpoint_name="brainybee-credit-risk-ep"
)
print(f"Endpoint Status: {predictor.endpoint_context.endpoint_name} is live!")

[!NOTE] Deployment typically takes 3–5 minutes as AWS provisions the underlying EC2 instances.

Step 3: Configure Auto Scaling Policies

Now, we will enable the infrastructure to scale from 1 to 3 instances based on the number of invocations.

bash

# 1. Register the endpoint as a scalable target
aws application-autoscaling register-scalable-target \
    --service-namespace sagemaker \
    --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --min-capacity 1 \
    --max-capacity 3

# 2. Apply a Target Tracking scaling policy
aws application-autoscaling put-scaling-policy \
    --policy-name BrainyBeeScalingPolicy \
    --service-namespace sagemaker \
    --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{"TargetValue": 50.0, "PredefinedMetricSpecification": {"PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"}, "ScaleInCooldown": 300, "ScaleOutCooldown": 60}'

Checkpoints

Endpoint Verification: Run aws sagemaker describe-endpoint --endpoint-name brainybee-credit-risk-ep. The EndpointStatus should be InService.
Auto Scaling Registration: Run aws application-autoscaling describe-scalable-targets --service-namespace sagemaker. You should see your endpoint listed with a MinCapacity of 1 and MaxCapacity of 3.

Visualizing the Scaling Logic

The following graph represents how the auto-scaling policy manages instance count relative to the target metric (Invocations per Instance).

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Troubleshooting

Error	Cause	Solution
`AccessDenied`	IAM role lacks `sagemaker:CreateEndpoint`	Re-attach `AmazonSageMakerFullAccess` or check role ARN.
`ResourceLimitExceeded`	Account limit for `ml.t2.medium` reached	Request limit increase or use a different instance type.
`ValidationError`	Incorrect Resource ID format in CLI	Ensure format is `endpoint/<name>/variant/AllTraffic`.

Clean-Up / Teardown

[!WARNING] Failure to delete these resources will result in ongoing charges for the ml.t2.medium instance.

bash

# Delete the scaling policy
aws application-autoscaling delete-scaling-policy \
    --policy-name BrainyBeeScalingPolicy \
    --service-namespace sagemaker \
    --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount

# Delete the endpoint and configuration
aws sagemaker delete-endpoint --endpoint-name brainybee-credit-risk-ep
aws sagemaker delete-endpoint-config --endpoint-config-name brainybee-credit-risk-ep

# Delete the model
aws sagemaker delete-model --model-name xgboost-model

Stretch Challenge

Serverless Transformation: Modify your Python deployment script to use a ServerlessInferenceConfig instead of a dedicated instance type. Compare the deployment time and how AWS handles scaling without an explicit scaling policy.

Cost Estimate

SageMaker Real-Time (ml.t2.medium): ~$0.056 per hour per instance. Running 1 instance for 24 hours costs ~$1.34.
Storage (S3/CloudWatch): Negligible for this lab (~$0.01).
Total Estimated Spend: <$0.10 for the duration of this 30-minute lab if cleaned up immediately.

Concept Review

Deployment Type	Use Case	Scaling Behavior	Cost Model
Real-Time	Low latency, steady traffic	Manual or Auto Scaling	Hourly instance fee
Serverless	Intermittent traffic, cold-start okay	Managed by AWS (0 to Max)	Pay per invocation/duration
Asynchronous	Large payloads, long processing	Managed (scale to zero)	Hourly instance fee
Batch	Bulk processing, no persistent endpoint	Not applicable	Pay for job duration

[!IMPORTANT] Choosing between Managed (SageMaker) and Unmanaged (ECS/EKS) depends on the need for control. Managed endpoints reduce operational burden, while Unmanaged targets provide custom networking and deeper infrastructure control.

Lab: Deploying and Scaling ML Infrastructure on AWS

Prerequisites

To successfully complete this lab, you need:

An AWS Account with AdministratorAccess permissions.
AWS CLI installed and configured on your local machine with access to <YOUR_REGION> (e.g., us-east-1).
Python 3.9+ and the sagemaker and boto3 libraries installed (pip install sagemaker boto3).
Basic familiarity with the AWS Management Console.

Learning Objectives

By the end of this lab, you will be able to:

Select the appropriate deployment target based on latency and cost requirements.
Deploy a pre-trained model to a SageMaker Real-Time Endpoint using the Python SDK.
Configure Auto Scaling policies to handle fluctuating traffic patterns.
Verify infrastructure scaling using CloudWatch metrics and CLI commands.

Architecture Overview

The following diagram illustrates the infrastructure components you will provision:

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an IAM Execution Role

SageMaker requires an IAM role to access S3 buckets and CloudWatch logs on your behalf.

bash

# Create the trust policy file
echo '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "sagemaker.amazonaws.com"},"Action": "sts:AssumeRole"}]}' > trust-policy.json

# Create the role
aws iam create-role --role-name BrainyBee-SageMaker-Role --assume-role-policy-document file://trust-policy.json

# Attach the SageMaker Full Access policy
aws iam attach-role-policy --role-name BrainyBee-SageMaker-Role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

▶Console alternative

Navigate to IAM > Roles > Create Role. Select AWS Service and SageMaker. Attach the AmazonSageMakerFullAccess policy and name it BrainyBee-SageMaker-Role.

Step 2: Deploy a Model to a Real-Time Endpoint

We will use a sample pre-trained model provided by AWS to create a Real-Time endpoint.

python

import sagemaker
from sagemaker.image_uris import retrieve

session = sagemaker.Session()
role = "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/BrainyBee-SageMaker-Role"
region = session.boto_region_name

# Retrieve the container URI for XGBoost
container = retrieve("xgboost", region, "1.5-1")

# Define the model
model = sagemaker.model.Model(
    image_uri=container,
    model_data="s3://sagemaker-sample-files/datasets/tabular/uci_statlog_german_credit_data/model/xgboost-model.tar.gz",
    role=role
)

# Deploy the endpoint
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium",
    endpoint_name="brainybee-credit-risk-ep"
)
print(f"Endpoint Status: {predictor.endpoint_context.endpoint_name} is live!")

[!NOTE] Deployment typically takes 3–5 minutes as AWS provisions the underlying EC2 instances.

Step 3: Configure Auto Scaling Policies

Now, we will enable the infrastructure to scale from 1 to 3 instances based on the number of invocations.

bash

# 1. Register the endpoint as a scalable target
aws application-autoscaling register-scalable-target \
    --service-namespace sagemaker \
    --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --min-capacity 1 \
    --max-capacity 3

# 2. Apply a Target Tracking scaling policy
aws application-autoscaling put-scaling-policy \
    --policy-name BrainyBeeScalingPolicy \
    --service-namespace sagemaker \
    --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{"TargetValue": 50.0, "PredefinedMetricSpecification": {"PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"}, "ScaleInCooldown": 300, "ScaleOutCooldown": 60}'

Checkpoints

Endpoint Verification: Run aws sagemaker describe-endpoint --endpoint-name brainybee-credit-risk-ep. The EndpointStatus should be InService.
Auto Scaling Registration: Run aws application-autoscaling describe-scalable-targets --service-namespace sagemaker. You should see your endpoint listed with a MinCapacity of 1 and MaxCapacity of 3.

Visualizing the Scaling Logic

The following graph represents how the auto-scaling policy manages instance count relative to the target metric (Invocations per Instance).

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Troubleshooting

Error	Cause	Solution
`AccessDenied`	IAM role lacks `sagemaker:CreateEndpoint`	Re-attach `AmazonSageMakerFullAccess` or check role ARN.
`ResourceLimitExceeded`	Account limit for `ml.t2.medium` reached	Request limit increase or use a different instance type.
`ValidationError`	Incorrect Resource ID format in CLI	Ensure format is `endpoint/<name>/variant/AllTraffic`.

Clean-Up / Teardown

[!WARNING] Failure to delete these resources will result in ongoing charges for the ml.t2.medium instance.

bash

# Delete the scaling policy
aws application-autoscaling delete-scaling-policy \
    --policy-name BrainyBeeScalingPolicy \
    --service-namespace sagemaker \
    --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \
    --scalable-dimension sagemaker:variant:DesiredInstanceCount

# Delete the endpoint and configuration
aws sagemaker delete-endpoint --endpoint-name brainybee-credit-risk-ep
aws sagemaker delete-endpoint-config --endpoint-config-name brainybee-credit-risk-ep

# Delete the model
aws sagemaker delete-model --model-name xgboost-model

Stretch Challenge

Cost Estimate

SageMaker Real-Time (ml.t2.medium): ~$0.056 per hour per instance. Running 1 instance for 24 hours costs ~$1.34.
Storage (S3/CloudWatch): Negligible for this lab (~$0.01).
Total Estimated Spend: <$0.10 for the duration of this 30-minute lab if cleaned up immediately.

Concept Review

Deployment Type	Use Case	Scaling Behavior	Cost Model
Real-Time	Low latency, steady traffic	Manual or Auto Scaling	Hourly instance fee
Serverless	Intermittent traffic, cold-start okay	Managed by AWS (0 to Max)	Pay per invocation/duration
Asynchronous	Large payloads, long processing	Managed (scale to zero)	Hourly instance fee
Batch	Bulk processing, no persistent endpoint	Not applicable	Pay for job duration

[!IMPORTANT] Choosing between Managed (SageMaker) and Unmanaged (ECS/EKS) depends on the need for control. Managed endpoints reduce operational burden, while Unmanaged targets provide custom networking and deeper infrastructure control.