Study Guide890 words

AWS Developer Tools for ML: Capabilities and Quotas

Capabilities and quotas for AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy

AWS Developer Tools for ML: Capabilities and Quotas

This study guide focuses on the trio of AWS developer tools essential for automating Machine Learning workflows: AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. Understanding their specific capabilities, limits, and how they integrate into an ML lifecycle is critical for the AWS Machine Learning Engineer Associate exam.


Learning Objectives

After studying this guide, you will be able to:

  • Differentiate between the primary roles of CodePipeline, CodeBuild, and CodeDeploy in an ML context.
  • Identify key service quotas that impact pipeline design and execution.
  • Select appropriate deployment strategies (Canary vs. Blue/Green) for model hosting.
  • Configure build and deployment specifications (buildspec.yml and appspec.yml).

Key Terms & Glossary

  • CI/CD (Continuous Integration/Continuous Delivery): The practice of automating the building, testing, and deployment of code and models.
  • Artifact: A file or set of files (like a zipped model or code) produced by one stage and used by another.
  • Buildspec: A YAML collection of build commands and settings used by AWS CodeBuild.
  • Appspec: A file used by AWS CodeDeploy to manage the deployment of an application.
  • Webhook: A mechanism that allows an external service (like GitHub) to notify CodePipeline to start an execution when code is pushed.

The "Big Idea"

In a traditional software environment, CI/CD moves code from commit to production. In MLOps, the "Big Idea" is that the pipeline must handle data, code, and models. AWS CodePipeline acts as the "spine," coordinating the flow where CodeBuild compiles the training environment or runs tests, and CodeDeploy pushes the finalized model to an endpoint or EC2 instance. Automation ensures that model retraining is reproducible and follows strict governance.


Formula / Concept Box

ServicePrimary FunctionCore Configuration File
AWS CodePipelineWorkflow OrchestrationPipeline Definition (JSON/Console)
AWS CodeBuildCompile, Test, Docker Image Buildbuildspec.yml
AWS CodeDeployAutomated Deployment & Rollbackappspec.yml

[!IMPORTANT] Service Quota Alert: AWS CodeBuild has a default timeout of 8 hours per build and a 60 concurrent build limit in many regions. If training a large model, these limits might require a request for an increase or a transition to SageMaker Training Jobs.


Hierarchical Outline

  1. AWS CodePipeline (Orchestrator)
    • Stages: Logical groupings (Source, Build, Staging, Production).
    • Actions: Individual tasks within a stage (e.g., Invoke Lambda, Run CodeBuild).
    • Transitions: Arrows between stages that can be disabled to "stop" the flow for manual approval.
  2. AWS CodeBuild (The Worker)
    • Environment: Managed Docker containers (standard, custom, or GPU-optimized).
    • Phases: install, pre_build, build, post_build.
    • Output: Artifacts stored in Amazon S3.
  3. AWS CodeDeploy (The Deliverer)
    • Deployment Groups: Sets of EC2 instances or Lambda functions tagged for deployment.
    • Deployment Configurations: Canary, Linear, or All-at-once.

Visual Anchors

ML CI/CD Pipeline Flow

Loading Diagram...

Blue/Green Deployment Strategy

\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, rounded corners, minimum width=3cm, minimum height=1cm, align=center}] \node (V1) [fill=blue!20] {\textbf{Blue (Old Version)}\Traffic: 0%}; \node (V2) [below of=V1, fill=green!20] {\textbf{Green (New Version)}\Traffic: 100%}; \node (LB) [left of=V1, xshift=-2cm, yshift=-1cm, diamond, draw, fill=gray!10] {Load\Balancer}; \draw [->, thick] (LB) -- (V2); \draw [->, dashed, red] (LB) -- (V1); \node (Note) [right of=V2, xshift=3cm, draw=none] {Green environment is tested\before shifting traffic.}; \end{tikzpicture}


Definition-Example Pairs

  • Canary Deployment: Releasing the new version to a small subset of users before a full rollout.
    • Example: Updating a recommendation model for 10% of users to check for latency issues before the full 100% launch.
  • Manual Approval Action: A pipeline step that pauses execution until an IAM user clicks "Approve."
    • Example: A Lead Data Scientist must review the accuracy metrics of a newly trained model in the Staging stage before it is deployed to the production endpoint.

Worked Examples

Scenario: Handling CodeBuild Timeouts

The Problem: You are using CodeBuild to train a small model. The build consistently fails after 60 minutes. The Solution:

  1. Check the buildspec.yml file for the timeout property.
  2. In the AWS Console, navigate to CodeBuild -> Edit Project -> Configuration.
  3. Increase the timeout value from the default (60 mins) up to the maximum (480 mins / 8 hours).
  4. If the job still fails, consider offloading the training to Amazon SageMaker Training Jobs and using CodeBuild only to trigger the SageMaker API.

Checkpoint Questions

  1. Which file is required by AWS CodeDeploy to understand which scripts to run during a deployment?
  2. What is the maximum duration a single build in AWS CodeBuild can run?
  3. True/False: CodePipeline can use an S3 bucket as a source for triggering a pipeline execution.
  4. How does a "Linear" deployment differ from a "Canary" deployment in CodeDeploy?

Muddy Points & Cross-Refs

  • CodeBuild vs. SageMaker Training: People often confuse these. CodeBuild is general-purpose (compiling code, building Docker images). SageMaker Training is optimized for ML (distributed training, high-performance GPU instances, managed spot training).
  • Deployment Hooks: In appspec.yml, hooks like BeforeAllowTraffic and AfterAllowTraffic are often confused. Remember: Before is for validation/warming up; After is for final health checks.

Comparison Tables

Deployment Strategies

StrategyRiskRollback SpeedCost (During Deploy)
All-at-onceHighSlow (must redeploy)Low
Blue/GreenLowInstant (switch LB)High (2x resources)
CanaryLowestFastMedium
LinearLowFastMedium

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free