Hands-On Lab1,050 words

Lab: Deploying and Scaling ML Infrastructure on AWS

Select deployment infrastructure based on existing architecture and requirements

Lab: Deploying and Scaling ML Infrastructure on AWS

This hands-on lab guides you through the process of selecting, deploying, and configuring auto-scaling for machine learning infrastructure on AWS, specifically focusing on Amazon SageMaker AI endpoints. You will learn to navigate the tradeoffs between different deployment targets and implement infrastructure-as-code principles.

Prerequisites

To successfully complete this lab, you need:

  • An AWS Account with AdministratorAccess permissions.
  • AWS CLI installed and configured on your local machine with access to <YOUR_REGION> (e.g., us-east-1).
  • Python 3.9+ and the sagemaker and boto3 libraries installed (pip install sagemaker boto3).
  • Basic familiarity with the AWS Management Console.

Learning Objectives

By the end of this lab, you will be able to:

  1. Select the appropriate deployment target based on latency and cost requirements.
  2. Deploy a pre-trained model to a SageMaker Real-Time Endpoint using the Python SDK.
  3. Configure Auto Scaling policies to handle fluctuating traffic patterns.
  4. Verify infrastructure scaling using CloudWatch metrics and CLI commands.

Architecture Overview

The following diagram illustrates the infrastructure components you will provision:

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an IAM Execution Role

SageMaker requires an IAM role to access S3 buckets and CloudWatch logs on your behalf.

bash
# Create the trust policy file echo '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"Service": "sagemaker.amazonaws.com"},"Action": "sts:AssumeRole"}]}' > trust-policy.json # Create the role aws iam create-role --role-name BrainyBee-SageMaker-Role --assume-role-policy-document file://trust-policy.json # Attach the SageMaker Full Access policy aws iam attach-role-policy --role-name BrainyBee-SageMaker-Role --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
Console alternative

Navigate to IAM > Roles > Create Role. Select AWS Service and SageMaker. Attach the AmazonSageMakerFullAccess policy and name it BrainyBee-SageMaker-Role.

Step 2: Deploy a Model to a Real-Time Endpoint

We will use a sample pre-trained model provided by AWS to create a Real-Time endpoint.

python
import sagemaker from sagemaker.image_uris import retrieve session = sagemaker.Session() role = "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/BrainyBee-SageMaker-Role" region = session.boto_region_name # Retrieve the container URI for XGBoost container = retrieve("xgboost", region, "1.5-1") # Define the model model = sagemaker.model.Model( image_uri=container, model_data="s3://sagemaker-sample-files/datasets/tabular/uci_statlog_german_credit_data/model/xgboost-model.tar.gz", role=role ) # Deploy the endpoint predictor = model.deploy( initial_instance_count=1, instance_type="ml.t2.medium", endpoint_name="brainybee-credit-risk-ep" ) print(f"Endpoint Status: {predictor.endpoint_context.endpoint_name} is live!")

[!NOTE] Deployment typically takes 3–5 minutes as AWS provisions the underlying EC2 instances.

Step 3: Configure Auto Scaling Policies

Now, we will enable the infrastructure to scale from 1 to 3 instances based on the number of invocations.

bash
# 1. Register the endpoint as a scalable target aws application-autoscaling register-scalable-target \ --service-namespace sagemaker \ --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \ --scalable-dimension sagemaker:variant:DesiredInstanceCount \ --min-capacity 1 \ --max-capacity 3 # 2. Apply a Target Tracking scaling policy aws application-autoscaling put-scaling-policy \ --policy-name BrainyBeeScalingPolicy \ --service-namespace sagemaker \ --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \ --scalable-dimension sagemaker:variant:DesiredInstanceCount \ --policy-type TargetTrackingScaling \ --target-tracking-scaling-policy-configuration '{"TargetValue": 50.0, "PredefinedMetricSpecification": {"PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"}, "ScaleInCooldown": 300, "ScaleOutCooldown": 60}'

Checkpoints

  1. Endpoint Verification: Run aws sagemaker describe-endpoint --endpoint-name brainybee-credit-risk-ep. The EndpointStatus should be InService.
  2. Auto Scaling Registration: Run aws application-autoscaling describe-scalable-targets --service-namespace sagemaker. You should see your endpoint listed with a MinCapacity of 1 and MaxCapacity of 3.

Visualizing the Scaling Logic

The following graph represents how the auto-scaling policy manages instance count relative to the target metric (Invocations per Instance).

\begin{tikzpicture} \draw[->] (0,0) -- (6,0) node[right]{Load (Invocations/Min)}; \draw[->] (0,0) -- (0,4) node[above]{Instances}; \draw[dashed, red] (0,2) -- (6,2) node[right]{Target: 50}; \draw[blue, thick] (0,1) -- (2.5,1) -- (3,2) -- (4.5,2) -- (5,3); \node at (1.2,0.7) {1 Instance}; \node at (3.8,1.7) {Scaling Out}; \node at (5.5,3.2) {3 Instances (Max)}; \end{tikzpicture}

Troubleshooting

ErrorCauseSolution
AccessDeniedIAM role lacks sagemaker:CreateEndpointRe-attach AmazonSageMakerFullAccess or check role ARN.
ResourceLimitExceededAccount limit for ml.t2.medium reachedRequest limit increase or use a different instance type.
ValidationErrorIncorrect Resource ID format in CLIEnsure format is endpoint/<name>/variant/AllTraffic.

Clean-Up / Teardown

[!WARNING] Failure to delete these resources will result in ongoing charges for the ml.t2.medium instance.

bash
# Delete the scaling policy aws application-autoscaling delete-scaling-policy \ --policy-name BrainyBeeScalingPolicy \ --service-namespace sagemaker \ --resource-id endpoint/brainybee-credit-risk-ep/variant/AllTraffic \ --scalable-dimension sagemaker:variant:DesiredInstanceCount # Delete the endpoint and configuration aws sagemaker delete-endpoint --endpoint-name brainybee-credit-risk-ep aws sagemaker delete-endpoint-config --endpoint-config-name brainybee-credit-risk-ep # Delete the model aws sagemaker delete-model --model-name xgboost-model

Stretch Challenge

Serverless Transformation: Modify your Python deployment script to use a ServerlessInferenceConfig instead of a dedicated instance type. Compare the deployment time and how AWS handles scaling without an explicit scaling policy.

Cost Estimate

  • SageMaker Real-Time (ml.t2.medium): ~$0.056 per hour per instance. Running 1 instance for 24 hours costs ~$1.34.
  • Storage (S3/CloudWatch): Negligible for this lab (~$0.01).
  • Total Estimated Spend: <$0.10 for the duration of this 30-minute lab if cleaned up immediately.

Concept Review

Deployment TypeUse CaseScaling BehaviorCost Model
Real-TimeLow latency, steady trafficManual or Auto ScalingHourly instance fee
ServerlessIntermittent traffic, cold-start okayManaged by AWS (0 to Max)Pay per invocation/duration
AsynchronousLarge payloads, long processingManaged (scale to zero)Hourly instance fee
BatchBulk processing, no persistent endpointNot applicablePay for job duration

[!IMPORTANT] Choosing between Managed (SageMaker) and Unmanaged (ECS/EKS) depends on the need for control. Managed endpoints reduce operational burden, while Unmanaged targets provide custom networking and deeper infrastructure control.

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free