Hands-On Lab920 words

Lab: Automating Scalable ML Infrastructure with AWS CDK

Create and script infrastructure based on existing architecture and requirements

Lab: Automating Scalable ML Infrastructure with AWS CDK

This lab focuses on Domain 3.2 of the AWS Certified Machine Learning Engineer – Associate (MLA-C01) exam. You will transition from manual resource creation to Infrastructure as Code (IaC) by scripting a SageMaker inference endpoint with an automated scaling policy.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for SageMaker instances.

Prerequisites

Before starting, ensure you have the following:

  • An AWS Account with administrative access.
  • AWS CLI installed and configured with <YOUR_CREDENTIALS>.
  • Node.js (v14+) and Python 3.8+ installed.
  • AWS CDK Toolkit installed globally: npm install -g aws-cdk.
  • Basic knowledge of Python and SageMaker hosting concepts.

Learning Objectives

By the end of this lab, you will be able to:

  1. Initialize a Python-based AWS CDK project for ML infrastructure.
  2. Script a SageMaker Endpoint including Model, Endpoint Configuration, and Production Variants.
  3. Implement Target Tracking Auto Scaling policies based on InvocationsPerInstance metrics.
  4. Deploy and verify infrastructure using the CDK CLI.

Architecture Overview

The following diagram illustrates the infrastructure defined in your CDK script and how it interacts with AWS services.

Loading Diagram...

Step-by-Step Instructions

Step 1: Initialize the CDK Project

First, we create a dedicated directory and initialize a new CDK project using the Python template.

bash
mkdir brainybee-ml-infra && cd brainybee-ml-infra cdk init app --language python source .venv/bin/activate pip install -r requirements.txt

Step 2: Define the SageMaker Resources

Open brainybee_ml_infra/brainybee_ml_infra_stack.py. We will use the aws_cdk.aws_sagemaker module to define our infrastructure.

python
from aws_cdk import ( Stack, aws_sagemaker as sagemaker, aws_applicationautoscaling as scaling ) from constructs import Construct class BrainybeeMlInfraStack(Stack): def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None: super().__init__(scope, construct_id, **kwargs) # 1. Define the Model model = sagemaker.CfnModel(self, "MyModel", execution_role_arn="<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>", primary_container=sagemaker.CfnModel.ContainerDefinitionProperty( image="<YOUR_ECR_IMAGE_URI>" # e.g., XGBoost built-in ) ) # 2. Define Endpoint Config config = sagemaker.CfnEndpointConfig(self, "MyConfig", production_variants=[ sagemaker.CfnEndpointConfig.ProductionVariantProperty( initial_instance_count=1, instance_type="ml.t2.medium", model_name=model.attr_model_name, variant_name="AllTraffic" ) ] ) # 3. Define the Endpoint endpoint = sagemaker.CfnEndpoint(self, "MyEndpoint", endpoint_config_name=config.attr_endpoint_config_name )
Console alternative

Navigate to

SageMaker AI > Inference > Models

to create the model, then to

Endpoint configurations

, and finally

Endpoints

. However, manual creation is not repeatable and prone to human error compared to this CDK approach.

Step 3: Configure Auto Scaling

To ensure the infrastructure is cost-effective yet scalable, we add a scaling policy based on the number of invocations.

python
# 4. Auto Scaling Configuration resource_id = f"endpoint/{endpoint.attr_endpoint_name}/variant/AllTraffic" scalable_target = scaling.CfnScalableTarget(self, "ScalingTarget", max_capacity=3, min_capacity=1, resource_id=resource_id, scalable_dimension="sagemaker:variant:DesiredInstanceCount", service_namespace="sagemaker" ) scaling.CfnScalingPolicy(self, "ScalingPolicy", policy_name="InvocationsScaling", policy_type="TargetTrackingScaling", scaling_target_id=scalable_target.ref, target_tracking_scaling_policy_configuration=scaling.CfnScalingPolicy.TargetTrackingScalingPolicyConfigurationProperty( target_value=100.0, predefined_metric_specification=scaling.CfnScalingPolicy.PredefinedMetricSpecificationProperty( predefined_metric_type="SageMakerVariantInvocationsPerInstance" ) ) )

Step 4: Deploy the Infrastructure

Synthesize the CloudFormation template and deploy it to your account.

bash
cdk synth cdk deploy

[!TIP] Use cdk diff before deploying to see exactly what resources will be created or modified in your AWS environment.

Checkpoints

  1. CloudFormation Verification: Go to the CloudFormation Console. Look for BrainybeeMlInfraStack. Ensure the status is CREATE_COMPLETE.
  2. SageMaker Verification: Go to SageMaker > Inference > Endpoints. Confirm MyEndpoint is in InService status.
  3. Scaling Verification: Select the endpoint, go to the Settings tab. Verify that the Asynchronous/Auto Scaling section shows the policy we defined.

Troubleshooting

ErrorLikely CauseFix
AccessDeniedIAM role lacks SageMaker or ECR permissions.Attach AmazonSageMakerFullAccess to your deployment user/role.
ResourceLimitExceededYou reached the quota for ml.t2.medium instances.Check Service Quotas or change the instance_type to a smaller one like ml.t3.medium.
Model Image ErrorThe ECR image URI is incorrect or private.Ensure the image URI is valid and accessible by SageMaker.

Clean-Up / Teardown

To avoid ongoing costs for the hosted endpoint and instances, delete the stack immediately after finishing.

bash
cdk destroy

[!IMPORTANT] Manually verify in the SageMaker Console that the endpoint is deleted. CDK destroy removes the CloudFormation stack, which should trigger the deletion of the endpoint resources.

Stretch Challenge

Multi-Variant Deployment: Modify your CDK script to include two production variants (VariantA and VariantB) in a single EndpointConfig with a 50/50 traffic split. This is a common pattern for A/B Testing in production.

Cost Estimate

ServiceResourceEstimated Cost (US-East-1)
SageMakerml.t2.medium (Real-time)~$0.05 per hour
CloudFormationManaged Stack$0.00 (Free)
CloudWatchMetrics & Logs~$0.50 per month (low volume)

Total Estimated Lab Cost: < $0.15 (if completed in 1 hour).

Concept Review

IaC Tool Comparison

FeatureAWS CloudFormationAWS CDK
LanguageJSON / YAML (Declarative)Python, TS, Java (Imperative/Declarative)
AbstractionLow (Resource-level)High (Uses "Constructs")
LogicLimited (Mappings/Conditions)Full programming logic (Loops/Ifs)
MaintenanceVerbose templatesConcise, modular code

Scaling Visualized

The following TikZ diagram shows the logic of Target Tracking. The system adjusts capacity to keep the metric (Invocations) near the target line.

\begin{tikzpicture} \draw[->] (0,0) -- (6,0) node[right] {Time}; \draw[->] (0,0) -- (0,4) node[above] {Invocations};

code
% Target Line \draw[dashed, blue, thick] (0,2) -- (5.5,2) node[right] {Target (100)}; % Traffic Curve \draw[red, thick] plot [smooth] coordinates {(0,0.5) (1,1.8) (2,3.5) (3,2.2) (4,2.1) (5,0.8)}; % Scaling action labels \node[draw, fill=green!10, font=\scriptsize] at (2,3.8) {Scale OUT (Add Instances)}; \node[draw, fill=orange!10, font=\scriptsize] at (5,0.5) {Scale IN (Remove Instances)};

\end{tikzpicture}

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free