Lab: Automating Scalable ML Infrastructure with AWS CDK

This lab focuses on Domain 3.2 of the AWS Certified Machine Learning Engineer – Associate (MLA-C01) exam. You will transition from manual resource creation to Infrastructure as Code (IaC) by scripting a SageMaker inference endpoint with an automated scaling policy.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for SageMaker instances.

Prerequisites

Before starting, ensure you have the following:

An AWS Account with administrative access.
AWS CLI installed and configured with <YOUR_CREDENTIALS>.
Node.js (v14+) and Python 3.8+ installed.
AWS CDK Toolkit installed globally: npm install -g aws-cdk.
Basic knowledge of Python and SageMaker hosting concepts.

Learning Objectives

By the end of this lab, you will be able to:

Initialize a Python-based AWS CDK project for ML infrastructure.
Script a SageMaker Endpoint including Model, Endpoint Configuration, and Production Variants.
Implement Target Tracking Auto Scaling policies based on InvocationsPerInstance metrics.
Deploy and verify infrastructure using the CDK CLI.

Architecture Overview

The following diagram illustrates the infrastructure defined in your CDK script and how it interacts with AWS services.

Loading Diagram...

Step-by-Step Instructions

Step 1: Initialize the CDK Project

First, we create a dedicated directory and initialize a new CDK project using the Python template.

bash

mkdir brainybee-ml-infra && cd brainybee-ml-infra
cdk init app --language python
source .venv/bin/activate
pip install -r requirements.txt

Step 2: Define the SageMaker Resources

Open brainybee_ml_infra/brainybee_ml_infra_stack.py. We will use the aws_cdk.aws_sagemaker module to define our infrastructure.

python

from aws_cdk import (
    Stack,
    aws_sagemaker as sagemaker,
    aws_applicationautoscaling as scaling
)
from constructs import Construct

class BrainybeeMlInfraStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # 1. Define the Model
        model = sagemaker.CfnModel(self, "MyModel",
            execution_role_arn="<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>",
            primary_container=sagemaker.CfnModel.ContainerDefinitionProperty(
                image="<YOUR_ECR_IMAGE_URI>" # e.g., XGBoost built-in
            )
        )

        # 2. Define Endpoint Config
        config = sagemaker.CfnEndpointConfig(self, "MyConfig",
            production_variants=[
                sagemaker.CfnEndpointConfig.ProductionVariantProperty(
                    initial_instance_count=1,
                    instance_type="ml.t2.medium",
                    model_name=model.attr_model_name,
                    variant_name="AllTraffic"
                )
            ]
        )

        # 3. Define the Endpoint
        endpoint = sagemaker.CfnEndpoint(self, "MyEndpoint",
            endpoint_config_name=config.attr_endpoint_config_name
        )

▶Console alternative

Navigate to

SageMaker AI > Inference > Models

to create the model, then to

Endpoint configurations

, and finally

Endpoints

. However, manual creation is not repeatable and prone to human error compared to this CDK approach.

Step 3: Configure Auto Scaling

To ensure the infrastructure is cost-effective yet scalable, we add a scaling policy based on the number of invocations.

python

        # 4. Auto Scaling Configuration
        resource_id = f"endpoint/{endpoint.attr_endpoint_name}/variant/AllTraffic"
        
        scalable_target = scaling.CfnScalableTarget(self, "ScalingTarget",
            max_capacity=3,
            min_capacity=1,
            resource_id=resource_id,
            scalable_dimension="sagemaker:variant:DesiredInstanceCount",
            service_namespace="sagemaker"
        )

        scaling.CfnScalingPolicy(self, "ScalingPolicy",
            policy_name="InvocationsScaling",
            policy_type="TargetTrackingScaling",
            scaling_target_id=scalable_target.ref,
            target_tracking_scaling_policy_configuration=scaling.CfnScalingPolicy.TargetTrackingScalingPolicyConfigurationProperty(
                target_value=100.0,
                predefined_metric_specification=scaling.CfnScalingPolicy.PredefinedMetricSpecificationProperty(
                    predefined_metric_type="SageMakerVariantInvocationsPerInstance"
                )
            )
        )

Step 4: Deploy the Infrastructure

Synthesize the CloudFormation template and deploy it to your account.

bash

cdk synth
cdk deploy

[!TIP] Use cdk diff before deploying to see exactly what resources will be created or modified in your AWS environment.

Checkpoints

CloudFormation Verification: Go to the CloudFormation Console. Look for BrainybeeMlInfraStack. Ensure the status is CREATE_COMPLETE.
SageMaker Verification: Go to SageMaker > Inference > Endpoints. Confirm MyEndpoint is in InService status.
Scaling Verification: Select the endpoint, go to the Settings tab. Verify that the Asynchronous/Auto Scaling section shows the policy we defined.

Troubleshooting

Error	Likely Cause	Fix
`AccessDenied`	IAM role lacks SageMaker or ECR permissions.	Attach `AmazonSageMakerFullAccess` to your deployment user/role.
`ResourceLimitExceeded`	You reached the quota for `ml.t2.medium` instances.	Check Service Quotas or change the `instance_type` to a smaller one like `ml.t3.medium`.
`Model Image Error`	The ECR image URI is incorrect or private.	Ensure the image URI is valid and accessible by SageMaker.

Clean-Up / Teardown

To avoid ongoing costs for the hosted endpoint and instances, delete the stack immediately after finishing.

bash

cdk destroy

[!IMPORTANT] Manually verify in the SageMaker Console that the endpoint is deleted. CDK destroy removes the CloudFormation stack, which should trigger the deletion of the endpoint resources.

Stretch Challenge

Multi-Variant Deployment: Modify your CDK script to include two production variants (VariantA and VariantB) in a single EndpointConfig with a 50/50 traffic split. This is a common pattern for A/B Testing in production.

Cost Estimate

Service	Resource	Estimated Cost (US-East-1)
SageMaker	`ml.t2.medium` (Real-time)	~$0.05 per hour
CloudFormation	Managed Stack	$0.00 (Free)
CloudWatch	Metrics & Logs	~$0.50 per month (low volume)

Total Estimated Lab Cost: < $0.15 (if completed in 1 hour).

Concept Review

IaC Tool Comparison

Feature	AWS CloudFormation	AWS CDK
Language	JSON / YAML (Declarative)	Python, TS, Java (Imperative/Declarative)
Abstraction	Low (Resource-level)	High (Uses "Constructs")
Logic	Limited (Mappings/Conditions)	Full programming logic (Loops/Ifs)
Maintenance	Verbose templates	Concise, modular code

Scaling Visualized

The following TikZ diagram shows the logic of Target Tracking. The system adjusts capacity to keep the metric (Invocations) near the target line.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Lab: Automating Scalable ML Infrastructure with AWS CDK

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for SageMaker instances.

Prerequisites

Before starting, ensure you have the following:

An AWS Account with administrative access.
AWS CLI installed and configured with <YOUR_CREDENTIALS>.
Node.js (v14+) and Python 3.8+ installed.
AWS CDK Toolkit installed globally: npm install -g aws-cdk.
Basic knowledge of Python and SageMaker hosting concepts.

Learning Objectives

By the end of this lab, you will be able to:

Initialize a Python-based AWS CDK project for ML infrastructure.
Script a SageMaker Endpoint including Model, Endpoint Configuration, and Production Variants.
Implement Target Tracking Auto Scaling policies based on InvocationsPerInstance metrics.
Deploy and verify infrastructure using the CDK CLI.

Architecture Overview

The following diagram illustrates the infrastructure defined in your CDK script and how it interacts with AWS services.

Loading Diagram...

Step-by-Step Instructions

Step 1: Initialize the CDK Project

First, we create a dedicated directory and initialize a new CDK project using the Python template.

bash

mkdir brainybee-ml-infra && cd brainybee-ml-infra
cdk init app --language python
source .venv/bin/activate
pip install -r requirements.txt

Step 2: Define the SageMaker Resources

Open brainybee_ml_infra/brainybee_ml_infra_stack.py. We will use the aws_cdk.aws_sagemaker module to define our infrastructure.

python

from aws_cdk import (
    Stack,
    aws_sagemaker as sagemaker,
    aws_applicationautoscaling as scaling
)
from constructs import Construct

class BrainybeeMlInfraStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # 1. Define the Model
        model = sagemaker.CfnModel(self, "MyModel",
            execution_role_arn="<YOUR_SAGEMAKER_EXECUTION_ROLE_ARN>",
            primary_container=sagemaker.CfnModel.ContainerDefinitionProperty(
                image="<YOUR_ECR_IMAGE_URI>" # e.g., XGBoost built-in
            )
        )

        # 2. Define Endpoint Config
        config = sagemaker.CfnEndpointConfig(self, "MyConfig",
            production_variants=[
                sagemaker.CfnEndpointConfig.ProductionVariantProperty(
                    initial_instance_count=1,
                    instance_type="ml.t2.medium",
                    model_name=model.attr_model_name,
                    variant_name="AllTraffic"
                )
            ]
        )

        # 3. Define the Endpoint
        endpoint = sagemaker.CfnEndpoint(self, "MyEndpoint",
            endpoint_config_name=config.attr_endpoint_config_name
        )

▶Console alternative

Navigate to

SageMaker AI > Inference > Models

to create the model, then to

Endpoint configurations

, and finally

Endpoints

. However, manual creation is not repeatable and prone to human error compared to this CDK approach.

Step 3: Configure Auto Scaling

To ensure the infrastructure is cost-effective yet scalable, we add a scaling policy based on the number of invocations.

python

        # 4. Auto Scaling Configuration
        resource_id = f"endpoint/{endpoint.attr_endpoint_name}/variant/AllTraffic"
        
        scalable_target = scaling.CfnScalableTarget(self, "ScalingTarget",
            max_capacity=3,
            min_capacity=1,
            resource_id=resource_id,
            scalable_dimension="sagemaker:variant:DesiredInstanceCount",
            service_namespace="sagemaker"
        )

        scaling.CfnScalingPolicy(self, "ScalingPolicy",
            policy_name="InvocationsScaling",
            policy_type="TargetTrackingScaling",
            scaling_target_id=scalable_target.ref,
            target_tracking_scaling_policy_configuration=scaling.CfnScalingPolicy.TargetTrackingScalingPolicyConfigurationProperty(
                target_value=100.0,
                predefined_metric_specification=scaling.CfnScalingPolicy.PredefinedMetricSpecificationProperty(
                    predefined_metric_type="SageMakerVariantInvocationsPerInstance"
                )
            )
        )

Step 4: Deploy the Infrastructure

Synthesize the CloudFormation template and deploy it to your account.

bash

cdk synth
cdk deploy

[!TIP] Use cdk diff before deploying to see exactly what resources will be created or modified in your AWS environment.

Checkpoints

CloudFormation Verification: Go to the CloudFormation Console. Look for BrainybeeMlInfraStack. Ensure the status is CREATE_COMPLETE.
SageMaker Verification: Go to SageMaker > Inference > Endpoints. Confirm MyEndpoint is in InService status.
Scaling Verification: Select the endpoint, go to the Settings tab. Verify that the Asynchronous/Auto Scaling section shows the policy we defined.

Troubleshooting

Error	Likely Cause	Fix
`AccessDenied`	IAM role lacks SageMaker or ECR permissions.	Attach `AmazonSageMakerFullAccess` to your deployment user/role.
`ResourceLimitExceeded`	You reached the quota for `ml.t2.medium` instances.	Check Service Quotas or change the `instance_type` to a smaller one like `ml.t3.medium`.
`Model Image Error`	The ECR image URI is incorrect or private.	Ensure the image URI is valid and accessible by SageMaker.

Clean-Up / Teardown

To avoid ongoing costs for the hosted endpoint and instances, delete the stack immediately after finishing.

bash

cdk destroy

[!IMPORTANT] Manually verify in the SageMaker Console that the endpoint is deleted. CDK destroy removes the CloudFormation stack, which should trigger the deletion of the endpoint resources.

Stretch Challenge

Cost Estimate

Service	Resource	Estimated Cost (US-East-1)
SageMaker	`ml.t2.medium` (Real-time)	~$0.05 per hour
CloudFormation	Managed Stack	$0.00 (Free)
CloudWatch	Metrics & Logs	~$0.50 per month (low volume)

Total Estimated Lab Cost: < $0.15 (if completed in 1 hour).

Concept Review

IaC Tool Comparison

Feature	AWS CloudFormation	AWS CDK
Language	JSON / YAML (Declarative)	Python, TS, Java (Imperative/Declarative)
Abstraction	Low (Resource-level)	High (Uses "Constructs")
Logic	Limited (Mappings/Conditions)	Full programming logic (Loops/Ifs)
Maintenance	Verbose templates	Concise, modular code

Scaling Visualized

The following TikZ diagram shows the logic of Target Tracking. The system adjusts capacity to keep the metric (Invocations) near the target line.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds