AWS ML Engineer Associate: Scripting & Creating ML Infrastructure (Task 3.2)
Create and script infrastructure based on existing architecture and requirements
AWS ML Engineer Associate: Scripting & Creating ML Infrastructure
This guide covers the essential knowledge for Task 3.2: Create and script infrastructure based on existing architecture and requirements from the MLA-C01 exam. It focuses on Infrastructure as Code (IaC), scaling strategies, and containerization for Machine Learning workloads.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between on-demand and provisioned resource models.
- Compare and apply different auto-scaling policies for SageMaker endpoints.
- Explain the tradeoffs between AWS CloudFormation and AWS CDK.
- Identify the correct container service (ECS, EKS, SageMaker) for a given ML requirement.
- Automate the provisioning of compute resources using IaC templates.
Key Terms & Glossary
- Infrastructure as Code (IaC): Managing and provisioning infrastructure through machine-readable definition files rather than manual configuration.
- CloudFormation Stack: A collection of AWS resources that you can manage as a single unit.
- AWS CDK (Cloud Development Kit): A software development framework for defining cloud infrastructure in familiar programming languages (Python, TypeScript, etc.).
- SageMaker Endpoint: A fully managed service that allows you to host ML models for real-time inference.
- BYOC (Bring Your Own Container): The practice of using custom Docker images for SageMaker training or hosting when built-in algorithms are insufficient.
The "Big Idea"
The transition from manual setup to Automated Provisioning is the cornerstone of MLOps. By treating infrastructure as code, ML Engineers ensure that production environments are identical to testing environments, enabling repeatability, version control, and rapid scaling to meet fluctuating inference demands.
Formula / Concept Box
| Feature | AWS CloudFormation | AWS CDK |
|---|---|---|
| Language | Declarative (YAML/JSON) | Imperative/Object-Oriented (Python, JS, Go) |
| Abstraction | Low-level (Direct resource mapping) | High-level "Constructs" (Pre-configured patterns) |
| Execution | Built-in engine | Transpiles to CloudFormation templates |
| Best For | Simple, static infrastructure | Complex, logic-heavy ML pipelines |
Hierarchical Outline
- I. Infrastructure as Code (IaC) Paradigms
- Declarative: Defining the what (desired end state). Examples: CloudFormation, Terraform.
- Imperative: Defining the how (step-by-step instructions). Example: Shell scripts using AWS CLI.
- II. AWS Provisioning Tools
- CloudFormation: Uses Templates (blueprints) and Stacks (deployed resources).
- AWS CDK: Allows developers to use Python/TypeScript to generate CloudFormation templates.
- III. Scaling Policies for ML
- Target Tracking: Adjusts capacity based on a specific metric (e.g., maintain 70% CPU).
- Step Scaling: Increases/decreases capacity based on the size of the alarm breach.
- Scheduled Scaling: Scales based on known time patterns (e.g., business hours).
- IV. Containerization Services
- Amazon ECR: Registry for storing Docker images.
- Amazon ECS: Simple, serverless container orchestration (AWS-native).
- Amazon EKS: Managed Kubernetes for complex, portable microservices.
Visual Anchors
The IaC Workflow
Scaling Logic Diagram
Definition-Example Pairs
- On-Demand Resources: Resources that are launched and paid for as they are used.
- Example: Launching a SageMaker notebook instance for a quick data exploration task.
- Provisioned Resources: Capacity that is pre-allocated and often available instantly, but costs even when idle.
- Example: Using Provisioned Concurrency for Lambda functions to eliminate cold starts in real-time inference.
- Metric-Based Scaling: Triggering a scale-out event based on hardware or application performance.
- Example: Scaling a SageMaker endpoint because the
InvocationsPerInstancemetric exceeded 1000.
- Example: Scaling a SageMaker endpoint because the
Worked Example: CloudFormation Snippet
Creating a SageMaker Endpoint requires three resources: the Model, the Endpoint Configuration, and the Endpoint itself.
Resources:
MyModel:
Type: AWS::SageMaker::Model
Properties:
ExecutionRoleArn: !GetAtt MyRole.Arn
PrimaryContainer:
Image: !Ref ContainerImageUri
MyEndpointConfig:
Type: AWS::SageMaker::EndpointConfig
Properties:
ProductionVariants:
- InitialInstanceCount: 1
InstanceType: ml.m5.xlarge
ModelName: !GetAtt MyModel.ModelName
VariantName: AllTraffic
MyEndpoint:
Type: AWS::SageMaker::Endpoint
Properties:
EndpointConfigName: !GetAtt MyEndpointConfig.EndpointConfigNameCheckpoint Questions
- Which IaC tool would you choose if your team wants to use loops and logic in Python to define 50 different model endpoints? (Answer: AWS CDK)
- What is the primary benefit of using Amazon ECR in an ML pipeline? (Answer: It provides a secure, managed registry to store and version Docker images used for training and inference.)
- True or False: CloudFormation can roll back all changes if a single resource in a stack fails to provision. (Answer: True)
Muddy Points & Cross-Refs
- ECS vs. EKS: Use ECS for AWS-native simplicity; use EKS if you require Kubernetes-specific APIs or are migrating from an on-premises Kubernetes cluster.
- Scaling Metrics: Choosing between
CPUUtilizationandInvocationsPerInstanceis tricky. CPU is better for compute-heavy models, while Invocations is better for light models with high throughput. - SageMaker Neo: Often confused with scaling. Neo is for optimizing the model for specific hardware (edge devices), while Auto Scaling is for managing the number of instances.
Comparison Tables
| Deployment Target | Use Case | Pros | Cons |
|---|---|---|---|
| SageMaker Endpoints | Standard ML Hosting | Managed, easy auto-scaling | Can be more expensive |
| AWS Lambda | Intermittent/Spiky traffic | Pay-per-use, serverless | Cold starts, 15-min limit |
| Amazon ECS/EKS | Microservices Architecture | High control, portability | Operational overhead |
| SageMaker Batch | Large non-real-time datasets | Cost-effective for bulk | High latency (not real-time) |