Study Guide865 words

AWS ML Engineer Associate: Scripting & Creating ML Infrastructure (Task 3.2)

Create and script infrastructure based on existing architecture and requirements

AWS ML Engineer Associate: Scripting & Creating ML Infrastructure

This guide covers the essential knowledge for Task 3.2: Create and script infrastructure based on existing architecture and requirements from the MLA-C01 exam. It focuses on Infrastructure as Code (IaC), scaling strategies, and containerization for Machine Learning workloads.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between on-demand and provisioned resource models.
  • Compare and apply different auto-scaling policies for SageMaker endpoints.
  • Explain the tradeoffs between AWS CloudFormation and AWS CDK.
  • Identify the correct container service (ECS, EKS, SageMaker) for a given ML requirement.
  • Automate the provisioning of compute resources using IaC templates.

Key Terms & Glossary

  • Infrastructure as Code (IaC): Managing and provisioning infrastructure through machine-readable definition files rather than manual configuration.
  • CloudFormation Stack: A collection of AWS resources that you can manage as a single unit.
  • AWS CDK (Cloud Development Kit): A software development framework for defining cloud infrastructure in familiar programming languages (Python, TypeScript, etc.).
  • SageMaker Endpoint: A fully managed service that allows you to host ML models for real-time inference.
  • BYOC (Bring Your Own Container): The practice of using custom Docker images for SageMaker training or hosting when built-in algorithms are insufficient.

The "Big Idea"

The transition from manual setup to Automated Provisioning is the cornerstone of MLOps. By treating infrastructure as code, ML Engineers ensure that production environments are identical to testing environments, enabling repeatability, version control, and rapid scaling to meet fluctuating inference demands.

Formula / Concept Box

FeatureAWS CloudFormationAWS CDK
LanguageDeclarative (YAML/JSON)Imperative/Object-Oriented (Python, JS, Go)
AbstractionLow-level (Direct resource mapping)High-level "Constructs" (Pre-configured patterns)
ExecutionBuilt-in engineTranspiles to CloudFormation templates
Best ForSimple, static infrastructureComplex, logic-heavy ML pipelines

Hierarchical Outline

  • I. Infrastructure as Code (IaC) Paradigms
    • Declarative: Defining the what (desired end state). Examples: CloudFormation, Terraform.
    • Imperative: Defining the how (step-by-step instructions). Example: Shell scripts using AWS CLI.
  • II. AWS Provisioning Tools
    • CloudFormation: Uses Templates (blueprints) and Stacks (deployed resources).
    • AWS CDK: Allows developers to use Python/TypeScript to generate CloudFormation templates.
  • III. Scaling Policies for ML
    • Target Tracking: Adjusts capacity based on a specific metric (e.g., maintain 70% CPU).
    • Step Scaling: Increases/decreases capacity based on the size of the alarm breach.
    • Scheduled Scaling: Scales based on known time patterns (e.g., business hours).
  • IV. Containerization Services
    • Amazon ECR: Registry for storing Docker images.
    • Amazon ECS: Simple, serverless container orchestration (AWS-native).
    • Amazon EKS: Managed Kubernetes for complex, portable microservices.

Visual Anchors

The IaC Workflow

Loading Diagram...

Scaling Logic Diagram

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • On-Demand Resources: Resources that are launched and paid for as they are used.
    • Example: Launching a SageMaker notebook instance for a quick data exploration task.
  • Provisioned Resources: Capacity that is pre-allocated and often available instantly, but costs even when idle.
    • Example: Using Provisioned Concurrency for Lambda functions to eliminate cold starts in real-time inference.
  • Metric-Based Scaling: Triggering a scale-out event based on hardware or application performance.
    • Example: Scaling a SageMaker endpoint because the InvocationsPerInstance metric exceeded 1000.

Worked Example: CloudFormation Snippet

Creating a SageMaker Endpoint requires three resources: the Model, the Endpoint Configuration, and the Endpoint itself.

yaml
Resources: MyModel: Type: AWS::SageMaker::Model Properties: ExecutionRoleArn: !GetAtt MyRole.Arn PrimaryContainer: Image: !Ref ContainerImageUri MyEndpointConfig: Type: AWS::SageMaker::EndpointConfig Properties: ProductionVariants: - InitialInstanceCount: 1 InstanceType: ml.m5.xlarge ModelName: !GetAtt MyModel.ModelName VariantName: AllTraffic MyEndpoint: Type: AWS::SageMaker::Endpoint Properties: EndpointConfigName: !GetAtt MyEndpointConfig.EndpointConfigName

Checkpoint Questions

  1. Which IaC tool would you choose if your team wants to use loops and logic in Python to define 50 different model endpoints? (Answer: AWS CDK)
  2. What is the primary benefit of using Amazon ECR in an ML pipeline? (Answer: It provides a secure, managed registry to store and version Docker images used for training and inference.)
  3. True or False: CloudFormation can roll back all changes if a single resource in a stack fails to provision. (Answer: True)

Muddy Points & Cross-Refs

  • ECS vs. EKS: Use ECS for AWS-native simplicity; use EKS if you require Kubernetes-specific APIs or are migrating from an on-premises Kubernetes cluster.
  • Scaling Metrics: Choosing between CPUUtilization and InvocationsPerInstance is tricky. CPU is better for compute-heavy models, while Invocations is better for light models with high throughput.
  • SageMaker Neo: Often confused with scaling. Neo is for optimizing the model for specific hardware (edge devices), while Auto Scaling is for managing the number of instances.

Comparison Tables

Deployment TargetUse CaseProsCons
SageMaker EndpointsStandard ML HostingManaged, easy auto-scalingCan be more expensive
AWS LambdaIntermittent/Spiky trafficPay-per-use, serverlessCold starts, 15-min limit
Amazon ECS/EKSMicroservices ArchitectureHigh control, portabilityOperational overhead
SageMaker BatchLarge non-real-time datasetsCost-effective for bulkHigh latency (not real-time)

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free