AWS Deployment Services and Amazon SageMaker AI

This guide covers the spectrum of AWS machine learning deployment options, ranging from fully managed AI services to high-control unmanaged infrastructure, with a deep dive into Amazon SageMaker AI's hosting capabilities.

Learning Objectives

After studying this guide, you should be able to:

Distinguish between managed (SageMaker) and unmanaged (EC2/EKS/Lambda) deployment targets.
Select the appropriate SageMaker inference type (Real-time, Serverless, Asynchronous, Batch) based on latency and payload requirements.
Explain the benefits of optimization tools like SageMaker Neo for edge devices.
Identify deployment strategies such as Blue/Green, Canary, and Linear rollouts.
Evaluate tradeoffs between cost, operational overhead, and infrastructure control.

Key Terms & Glossary

Inference: The process of using a trained model to make predictions on new, unseen data.
Managed Endpoint: An AWS-hosted HTTP(S) URL that routes traffic to model instances, handling provisioning and load balancing automatically.
SageMaker Neo: A service that optimizes ML models for specific hardware platforms (e.g., NVIDIA, Intel, ARM) to reduce latency and footprint.
Blue/Green Deployment: A strategy that reduces downtime by running two identical production environments (Blue and Green) and shifting traffic between them.
Cold Start: The latency delay experienced in Serverless inference when a new execution environment is initialized.

The "Big Idea"

The core challenge of ML engineering is the Control vs. Convenience Tradeoff. AWS provides a spectrum: on one end, AI Services (like Rekognition) offer "ready-to-use" intelligence with zero management. In the middle, SageMaker AI provides a managed framework for custom models. On the other end, Unmanaged Services (like EKS) provide total control over the OS, network, and hardware at the cost of high operational complexity.

Formula / Concept Box

Inference Type	Best For	Typical Pricing Metric
Real-Time	Low latency, persistent traffic	Instance hours (uptime)
Serverless	Intermittent traffic, small payloads	Duration (ms) + Data processed
Asynchronous	Large payloads (up to 1GB), long processing times	Instance hours (auto-scales to 0)
Batch Transform	Large datasets, non-real-time	Amount of data processed

Hierarchical Outline

I. AWS Pretrained AI Services
- Computer Vision: Amazon Rekognition.
- Language/Text: Amazon Comprehend, Translate, Textract.
- Speech/Audio: Amazon Polly, Transcribe.
- Generative AI: Amazon Bedrock (Foundation Models via Converse API).
II. Amazon SageMaker Managed Hosting
- Deployment Models: Multi-model endpoints (hosting multiple models on one instance) vs. Multi-container endpoints.
- Optimization: SageMaker Neo (compilation for edge/cloud).
III. Unmanaged Deployment Targets
- Compute Options: EC2 (Full OS control), EKS/ECS (Containers), Lambda (Event-driven).
- Use Cases: Compliance (GDPR/HIPAA), custom software dependencies, Spot Instance cost savings.
IV. Deployment Resilience
- Autoscaling: Adjusting instance counts based on CPU/Latency metrics.
- Rollouts: All-at-once vs. Canary (partial) vs. Linear (incremental).

Visual Anchors

Deployment Target Decision Tree

Loading Diagram...

Blue/Green Deployment Architecture

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

SageMaker Pipelines: A CI/CD tool for ML. Example: Automating a workflow where a new model is trained on S3 data, evaluated, and then deployed to a staging endpoint if performance exceeds 90% accuracy.
Bring Your Own Container (BYOC): Using custom Docker images in SageMaker. Example: A financial firm needs a specific C++ library for high-speed feature engineering that is not included in standard AWS Deep Learning Containers.
Model Monitor: A feature that detects drift in data quality. Example: An e-commerce model trained on winter data starts failing in summer; Model Monitor detects that the input feature distribution has shifted.

Worked Examples

Scenario: The Image Processing Startup

Problem: A startup needs to process high-resolution satellite imagery. Each image takes 5 minutes to process and the payload is 500MB. They want to minimize costs when there are no images to process.

Solution:

Choice: Asynchronous Inference.
Reasoning: Real-time endpoints have a 60-second timeout and 6MB payload limit. Serverless has a 30MB limit. Asynchronous supports up to 1GB payloads and 1-hour processing.
Cost Optimization: Configure the internal autoscaling to scale the instance count to zero when the queue is empty.

Checkpoint Questions

What is the primary advantage of using Amazon SageMaker Neo for a model deployed on an IoT doorbell?
Which SageMaker deployment strategy shifts traffic in fixed increments (e.g., 10% every 5 minutes)?
Name two AWS compute services used for "unmanaged" model deployment.
True or False: Serverless Inference is the best choice for a model that requires constant, sub-10ms latency.

▶Click for Answers

It optimizes/compiles the model for specific hardware, reducing the memory footprint and latency.
Linear deployment strategy.
Amazon EC2, Amazon EKS (Kubernetes), Amazon ECS, or AWS Lambda.
False. Serverless inference can suffer from "cold starts" which increase latency during the first invocation after an idle period.

Muddy Points & Cross-Refs

SageMaker vs. Bedrock: People often confuse these. Use SageMaker if you have your own weights/code; use Bedrock if you want to use existing models (like Claude or Llama) via an API.
Spot Instances: While cost-effective for training, be careful using them for real-time inference in production, as they can be reclaimed by AWS with little notice. Use them for Batch Transform or Unmanaged EKS development clusters instead.

Comparison Tables

Managed vs. Unmanaged Deployment

Feature	Managed (SageMaker)	Unmanaged (EC2/EKS/Lambda)
Infrastructure	Abstracted; AWS manages OS/Patching	Full root access; user manages OS
Scalability	Built-in via simple policies	User must configure Cluster Autoscalers
Cost	Premium for management	Potentially lower (Spot/Fine-tuned instances)
Compliance	Standard AWS compliance (SOC/ISO)	Deep customization for specific regs (GDPR/HIPAA)
Effort	Low (Model-focused)	High (Infrastructure-focused)

AWS Deployment Services and Amazon SageMaker AI

Learning Objectives

After studying this guide, you should be able to:

Distinguish between managed (SageMaker) and unmanaged (EC2/EKS/Lambda) deployment targets.
Select the appropriate SageMaker inference type (Real-time, Serverless, Asynchronous, Batch) based on latency and payload requirements.
Explain the benefits of optimization tools like SageMaker Neo for edge devices.
Identify deployment strategies such as Blue/Green, Canary, and Linear rollouts.
Evaluate tradeoffs between cost, operational overhead, and infrastructure control.

Key Terms & Glossary

Inference: The process of using a trained model to make predictions on new, unseen data.
Managed Endpoint: An AWS-hosted HTTP(S) URL that routes traffic to model instances, handling provisioning and load balancing automatically.
SageMaker Neo: A service that optimizes ML models for specific hardware platforms (e.g., NVIDIA, Intel, ARM) to reduce latency and footprint.
Blue/Green Deployment: A strategy that reduces downtime by running two identical production environments (Blue and Green) and shifting traffic between them.
Cold Start: The latency delay experienced in Serverless inference when a new execution environment is initialized.

The "Big Idea"

Formula / Concept Box

Inference Type	Best For	Typical Pricing Metric
Real-Time	Low latency, persistent traffic	Instance hours (uptime)
Serverless	Intermittent traffic, small payloads	Duration (ms) + Data processed
Asynchronous	Large payloads (up to 1GB), long processing times	Instance hours (auto-scales to 0)
Batch Transform	Large datasets, non-real-time	Amount of data processed

Hierarchical Outline

I. AWS Pretrained AI Services
- Computer Vision: Amazon Rekognition.
- Language/Text: Amazon Comprehend, Translate, Textract.
- Speech/Audio: Amazon Polly, Transcribe.
- Generative AI: Amazon Bedrock (Foundation Models via Converse API).
II. Amazon SageMaker Managed Hosting
- Deployment Models: Multi-model endpoints (hosting multiple models on one instance) vs. Multi-container endpoints.
- Optimization: SageMaker Neo (compilation for edge/cloud).
III. Unmanaged Deployment Targets
- Compute Options: EC2 (Full OS control), EKS/ECS (Containers), Lambda (Event-driven).
- Use Cases: Compliance (GDPR/HIPAA), custom software dependencies, Spot Instance cost savings.
IV. Deployment Resilience
- Autoscaling: Adjusting instance counts based on CPU/Latency metrics.
- Rollouts: All-at-once vs. Canary (partial) vs. Linear (incremental).

Visual Anchors

Deployment Target Decision Tree

Loading Diagram...

Blue/Green Deployment Architecture

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

SageMaker Pipelines: A CI/CD tool for ML. Example: Automating a workflow where a new model is trained on S3 data, evaluated, and then deployed to a staging endpoint if performance exceeds 90% accuracy.
Bring Your Own Container (BYOC): Using custom Docker images in SageMaker. Example: A financial firm needs a specific C++ library for high-speed feature engineering that is not included in standard AWS Deep Learning Containers.
Model Monitor: A feature that detects drift in data quality. Example: An e-commerce model trained on winter data starts failing in summer; Model Monitor detects that the input feature distribution has shifted.

Worked Examples

Scenario: The Image Processing Startup

Solution:

Choice: Asynchronous Inference.
Reasoning: Real-time endpoints have a 60-second timeout and 6MB payload limit. Serverless has a 30MB limit. Asynchronous supports up to 1GB payloads and 1-hour processing.
Cost Optimization: Configure the internal autoscaling to scale the instance count to zero when the queue is empty.

Checkpoint Questions

What is the primary advantage of using Amazon SageMaker Neo for a model deployed on an IoT doorbell?
Which SageMaker deployment strategy shifts traffic in fixed increments (e.g., 10% every 5 minutes)?
Name two AWS compute services used for "unmanaged" model deployment.
True or False: Serverless Inference is the best choice for a model that requires constant, sub-10ms latency.

▶Click for Answers

It optimizes/compiles the model for specific hardware, reducing the memory footprint and latency.
Linear deployment strategy.
Amazon EC2, Amazon EKS (Kubernetes), Amazon ECS, or AWS Lambda.
False. Serverless inference can suffer from "cold starts" which increase latency during the first invocation after an idle period.

Muddy Points & Cross-Refs

SageMaker vs. Bedrock: People often confuse these. Use SageMaker if you have your own weights/code; use Bedrock if you want to use existing models (like Claude or Llama) via an API.
Spot Instances: While cost-effective for training, be careful using them for real-time inference in production, as they can be reclaimed by AWS with little notice. Use them for Batch Transform or Unmanaged EKS development clusters instead.

Comparison Tables

Managed vs. Unmanaged Deployment

Feature	Managed (SageMaker)	Unmanaged (EC2/EKS/Lambda)
Infrastructure	Abstracted; AWS manages OS/Patching	Full root access; user manages OS
Scalability	Built-in via simple policies	User must configure Cluster Autoscalers
Cost	Premium for management	Potentially lower (Spot/Fine-tuned instances)
Compliance	Standard AWS compliance (SOC/ISO)	Deep customization for specific regs (GDPR/HIPAA)
Effort	Low (Model-focused)	High (Infrastructure-focused)

AWS Deployment Services and Amazon SageMaker AI Study Guide

AWS Deployment Services and Amazon SageMaker AI

Learning Objectives

Key Terms & Glossary

The "Big Idea"

Formula / Concept Box

Hierarchical Outline

Visual Anchors

Deployment Target Decision Tree

Blue/Green Deployment Architecture

Definition-Example Pairs

Worked Examples

Scenario: The Image Processing Startup

Checkpoint Questions

Muddy Points & Cross-Refs

Comparison Tables

Managed vs. Unmanaged Deployment

AWS Deployment Services and Amazon SageMaker AI Study Guide

AWS Deployment Services and Amazon SageMaker AI

Learning Objectives

Key Terms & Glossary

The "Big Idea"

Formula / Concept Box

Hierarchical Outline

Visual Anchors

Deployment Target Decision Tree

Blue/Green Deployment Architecture

Definition-Example Pairs

Worked Examples

Scenario: The Image Processing Startup

Checkpoint Questions

Muddy Points & Cross-Refs

Comparison Tables

Managed vs. Unmanaged Deployment