Lab: Automating Scalable ML Infrastructure with AWS CDK
Create and script infrastructure based on existing architecture and requirements
Lab: Automating Scalable ML Infrastructure with AWS CDK
This lab focuses on Domain 3.2 of the AWS Certified Machine Learning Engineer – Associate (MLA-C01) exam. You will transition from manual resource creation to Infrastructure as Code (IaC) by scripting a SageMaker inference endpoint with an automated scaling policy.
[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for SageMaker instances.
Prerequisites
Before starting, ensure you have the following:
- An AWS Account with administrative access.
- AWS CLI installed and configured with
<YOUR_CREDENTIALS>. - Node.js (v14+) and Python 3.8+ installed.
- AWS CDK Toolkit installed globally:
npm install -g aws-cdk. - Basic knowledge of Python and SageMaker hosting concepts.
Learning Objectives
By the end of this lab, you will be able to:
- Initialize a Python-based AWS CDK project for ML infrastructure.
- Script a SageMaker Endpoint including Model, Endpoint Configuration, and Production Variants.
- Implement Target Tracking Auto Scaling policies based on
InvocationsPerInstancemetrics. - Deploy and verify infrastructure using the CDK CLI.
Architecture Overview
The following diagram illustrates the infrastructure defined in your CDK script and how it interacts with AWS services.
Step-by-Step Instructions
Step 1: Initialize the CDK Project
First, we create a dedicated directory and initialize a new CDK project using the Python template.
mkdir brainybee-ml-infra && cd brainybee-ml-infra
cdk init app --language python
source .venv/bin/activate
pip install -r requirements.txtStep 2: Define the SageMaker Resources
Open brainybee_ml_infra/brainybee_ml_infra_stack.py. We will use the aws_cdk.aws_sagemaker module to define our infrastructure.
from aws_cdk import (
Stack,
aws_sagemaker as sagemaker,
aws_applicationautoscaling as scaling
)
from constructs import Construct
class BrainybeeMlInfraStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
# 1. Define the Model
model = sagemaker.CfnModel(self, "MyModel",
execution_role_arn="<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>",
primary_container=sagemaker.CfnModel.ContainerDefinitionProperty(
image="<YOUR_ECR_IMAGE_URI>" # e.g., XGBoost built-in
)
)
# 2. Define Endpoint Config
config = sagemaker.CfnEndpointConfig(self, "MyConfig",
production_variants=[
sagemaker.CfnEndpointConfig.ProductionVariantProperty(
initial_instance_count=1,
instance_type="ml.t2.medium",
model_name=model.attr_model_name,
variant_name="AllTraffic"
)
]
)
# 3. Define the Endpoint
endpoint = sagemaker.CfnEndpoint(self, "MyEndpoint",
endpoint_config_name=config.attr_endpoint_config_name
)▶Console alternative
Navigate to
to create the model, then to
, and finally
. However, manual creation is not repeatable and prone to human error compared to this CDK approach.
Step 3: Configure Auto Scaling
To ensure the infrastructure is cost-effective yet scalable, we add a scaling policy based on the number of invocations.
# 4. Auto Scaling Configuration
resource_id = f"endpoint/{endpoint.attr_endpoint_name}/variant/AllTraffic"
scalable_target = scaling.CfnScalableTarget(self, "ScalingTarget",
max_capacity=3,
min_capacity=1,
resource_id=resource_id,
scalable_dimension="sagemaker:variant:DesiredInstanceCount",
service_namespace="sagemaker"
)
scaling.CfnScalingPolicy(self, "ScalingPolicy",
policy_name="InvocationsScaling",
policy_type="TargetTrackingScaling",
scaling_target_id=scalable_target.ref,
target_tracking_scaling_policy_configuration=scaling.CfnScalingPolicy.TargetTrackingScalingPolicyConfigurationProperty(
target_value=100.0,
predefined_metric_specification=scaling.CfnScalingPolicy.PredefinedMetricSpecificationProperty(
predefined_metric_type="SageMakerVariantInvocationsPerInstance"
)
)
)Step 4: Deploy the Infrastructure
Synthesize the CloudFormation template and deploy it to your account.
cdk synth
cdk deploy[!TIP] Use
cdk diffbefore deploying to see exactly what resources will be created or modified in your AWS environment.
Checkpoints
- CloudFormation Verification: Go to the CloudFormation Console. Look for
BrainybeeMlInfraStack. Ensure the status isCREATE_COMPLETE. - SageMaker Verification: Go to SageMaker > Inference > Endpoints. Confirm
MyEndpointis inInServicestatus. - Scaling Verification: Select the endpoint, go to the Settings tab. Verify that the Asynchronous/Auto Scaling section shows the policy we defined.
Troubleshooting
| Error | Likely Cause | Fix |
|---|---|---|
AccessDenied | IAM role lacks SageMaker or ECR permissions. | Attach AmazonSageMakerFullAccess to your deployment user/role. |
ResourceLimitExceeded | You reached the quota for ml.t2.medium instances. | Check Service Quotas or change the instance_type to a smaller one like ml.t3.medium. |
Model Image Error | The ECR image URI is incorrect or private. | Ensure the image URI is valid and accessible by SageMaker. |
Clean-Up / Teardown
To avoid ongoing costs for the hosted endpoint and instances, delete the stack immediately after finishing.
cdk destroy[!IMPORTANT] Manually verify in the SageMaker Console that the endpoint is deleted. CDK
destroyremoves the CloudFormation stack, which should trigger the deletion of the endpoint resources.
Stretch Challenge
Multi-Variant Deployment: Modify your CDK script to include two production variants (VariantA and VariantB) in a single EndpointConfig with a 50/50 traffic split. This is a common pattern for A/B Testing in production.
Cost Estimate
| Service | Resource | Estimated Cost (US-East-1) |
|---|---|---|
| SageMaker | ml.t2.medium (Real-time) | ~$0.05 per hour |
| CloudFormation | Managed Stack | $0.00 (Free) |
| CloudWatch | Metrics & Logs | ~$0.50 per month (low volume) |
Total Estimated Lab Cost: < $0.15 (if completed in 1 hour).
Concept Review
IaC Tool Comparison
| Feature | AWS CloudFormation | AWS CDK |
|---|---|---|
| Language | JSON / YAML (Declarative) | Python, TS, Java (Imperative/Declarative) |
| Abstraction | Low (Resource-level) | High (Uses "Constructs") |
| Logic | Limited (Mappings/Conditions) | Full programming logic (Loops/Ifs) |
| Maintenance | Verbose templates | Concise, modular code |
Scaling Visualized
The following TikZ diagram shows the logic of Target Tracking. The system adjusts capacity to keep the metric (Invocations) near the target line.
\begin{tikzpicture} \draw[->] (0,0) -- (6,0) node[right] {Time}; \draw[->] (0,0) -- (0,4) node[above] {Invocations};
% Target Line
\draw[dashed, blue, thick] (0,2) -- (5.5,2) node[right] {Target (100)};
% Traffic Curve
\draw[red, thick] plot [smooth] coordinates {(0,0.5) (1,1.8) (2,3.5) (3,2.2) (4,2.1) (5,0.8)};
% Scaling action labels
\node[draw, fill=green!10, font=\scriptsize] at (2,3.8) {Scale OUT (Add Instances)};
\node[draw, fill=orange!10, font=\scriptsize] at (5,0.5) {Scale IN (Remove Instances)};\end{tikzpicture}