AWS Deployment Services and Amazon SageMaker AI Study Guide
AWS deployment services (for example, Amazon SageMaker AI)
AWS Deployment Services and Amazon SageMaker AI
This guide covers the spectrum of AWS machine learning deployment options, ranging from fully managed AI services to high-control unmanaged infrastructure, with a deep dive into Amazon SageMaker AI's hosting capabilities.
Learning Objectives
After studying this guide, you should be able to:
- Distinguish between managed (SageMaker) and unmanaged (EC2/EKS/Lambda) deployment targets.
- Select the appropriate SageMaker inference type (Real-time, Serverless, Asynchronous, Batch) based on latency and payload requirements.
- Explain the benefits of optimization tools like SageMaker Neo for edge devices.
- Identify deployment strategies such as Blue/Green, Canary, and Linear rollouts.
- Evaluate tradeoffs between cost, operational overhead, and infrastructure control.
Key Terms & Glossary
- Inference: The process of using a trained model to make predictions on new, unseen data.
- Managed Endpoint: An AWS-hosted HTTP(S) URL that routes traffic to model instances, handling provisioning and load balancing automatically.
- SageMaker Neo: A service that optimizes ML models for specific hardware platforms (e.g., NVIDIA, Intel, ARM) to reduce latency and footprint.
- Blue/Green Deployment: A strategy that reduces downtime by running two identical production environments (Blue and Green) and shifting traffic between them.
- Cold Start: The latency delay experienced in Serverless inference when a new execution environment is initialized.
The "Big Idea"
The core challenge of ML engineering is the Control vs. Convenience Tradeoff. AWS provides a spectrum: on one end, AI Services (like Rekognition) offer "ready-to-use" intelligence with zero management. In the middle, SageMaker AI provides a managed framework for custom models. On the other end, Unmanaged Services (like EKS) provide total control over the OS, network, and hardware at the cost of high operational complexity.
Formula / Concept Box
| Inference Type | Best For | Typical Pricing Metric |
|---|---|---|
| Real-Time | Low latency, persistent traffic | Instance hours (uptime) |
| Serverless | Intermittent traffic, small payloads | Duration (ms) + Data processed |
| Asynchronous | Large payloads (up to 1GB), long processing times | Instance hours (auto-scales to 0) |
| Batch Transform | Large datasets, non-real-time | Amount of data processed |
Hierarchical Outline
- I. AWS Pretrained AI Services
- Computer Vision: Amazon Rekognition.
- Language/Text: Amazon Comprehend, Translate, Textract.
- Speech/Audio: Amazon Polly, Transcribe.
- Generative AI: Amazon Bedrock (Foundation Models via Converse API).
- II. Amazon SageMaker Managed Hosting
- Deployment Models: Multi-model endpoints (hosting multiple models on one instance) vs. Multi-container endpoints.
- Optimization: SageMaker Neo (compilation for edge/cloud).
- III. Unmanaged Deployment Targets
- Compute Options: EC2 (Full OS control), EKS/ECS (Containers), Lambda (Event-driven).
- Use Cases: Compliance (GDPR/HIPAA), custom software dependencies, Spot Instance cost savings.
- IV. Deployment Resilience
- Autoscaling: Adjusting instance counts based on CPU/Latency metrics.
- Rollouts: All-at-once vs. Canary (partial) vs. Linear (incremental).
Visual Anchors
Deployment Target Decision Tree
Blue/Green Deployment Architecture
\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, rounded corners, minimum width=3cm, minimum height=1cm, align=center}]
% Define components \node (LB) {Load Balancer \ (Traffic Shifting)}; \node (Blue) [below left of=LB, xshift=-1cm, fill=blue!20] {Blue Fleet \ (Current Version)}; \node (Green) [below right of=LB, xshift=1cm, fill=green!20] {Green Fleet \ (New Version)}; \node (Config) [above of=LB] {Deployment Strategy \ (Canary/Linear)};
% Connections \draw[->, thick] (Config) -- (LB); \draw[->, thick] (LB) -- node[left, xshift=-0.5cm] {90% Traffic} (Blue); \draw[->, thick] (LB) -- node[right, xshift=0.5cm] {10% Traffic} (Green);
\node[draw=none, below of=Blue, yshift=1cm] {\textit{Old Model}}; \node[draw=none, below of=Green, yshift=1cm] {\textit{New Model}};
\end{tikzpicture}
Definition-Example Pairs
- SageMaker Pipelines: A CI/CD tool for ML. Example: Automating a workflow where a new model is trained on S3 data, evaluated, and then deployed to a staging endpoint if performance exceeds 90% accuracy.
- Bring Your Own Container (BYOC): Using custom Docker images in SageMaker. Example: A financial firm needs a specific C++ library for high-speed feature engineering that is not included in standard AWS Deep Learning Containers.
- Model Monitor: A feature that detects drift in data quality. Example: An e-commerce model trained on winter data starts failing in summer; Model Monitor detects that the input feature distribution has shifted.
Worked Examples
Scenario: The Image Processing Startup
Problem: A startup needs to process high-resolution satellite imagery. Each image takes 5 minutes to process and the payload is 500MB. They want to minimize costs when there are no images to process.
Solution:
- Choice: Asynchronous Inference.
- Reasoning: Real-time endpoints have a 60-second timeout and 6MB payload limit. Serverless has a 30MB limit. Asynchronous supports up to 1GB payloads and 1-hour processing.
- Cost Optimization: Configure the internal autoscaling to scale the instance count to zero when the queue is empty.
Checkpoint Questions
- What is the primary advantage of using Amazon SageMaker Neo for a model deployed on an IoT doorbell?
- Which SageMaker deployment strategy shifts traffic in fixed increments (e.g., 10% every 5 minutes)?
- Name two AWS compute services used for "unmanaged" model deployment.
- True or False: Serverless Inference is the best choice for a model that requires constant, sub-10ms latency.
▶Click for Answers
- It optimizes/compiles the model for specific hardware, reducing the memory footprint and latency.
- Linear deployment strategy.
- Amazon EC2, Amazon EKS (Kubernetes), Amazon ECS, or AWS Lambda.
- False. Serverless inference can suffer from "cold starts" which increase latency during the first invocation after an idle period.
Muddy Points & Cross-Refs
- SageMaker vs. Bedrock: People often confuse these. Use SageMaker if you have your own weights/code; use Bedrock if you want to use existing models (like Claude or Llama) via an API.
- Spot Instances: While cost-effective for training, be careful using them for real-time inference in production, as they can be reclaimed by AWS with little notice. Use them for Batch Transform or Unmanaged EKS development clusters instead.
Comparison Tables
Managed vs. Unmanaged Deployment
| Feature | Managed (SageMaker) | Unmanaged (EC2/EKS/Lambda) |
|---|---|---|
| Infrastructure | Abstracted; AWS manages OS/Patching | Full root access; user manages OS |
| Scalability | Built-in via simple policies | User must configure Cluster Autoscalers |
| Cost | Premium for management | Potentially lower (Spot/Fine-tuned instances) |
| Compliance | Standard AWS compliance (SOC/ISO) | Deep customization for specific regs (GDPR/HIPAA) |
| Effort | Low (Model-focused) | High (Infrastructure-focused) |