Study Guide: Selecting Compute Environments for Machine Learning
Choosing the appropriate compute environment for training and inference based on requirements (for example, GPU or CPU specifications, processor family, networking bandwidth)
Selecting Compute Environments for Machine Learning
This guide explores the critical decision-making process for selecting the right compute resources (CPU, GPU, or specialized silicon) for machine learning training and inference workloads within the AWS ecosystem.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between CPU and GPU architectures and their specific use cases in ML.
- Identify the appropriate EC2 instance families for training versus inference.
- Explain the cost and performance benefits of AWS Trainium and AWS Inferentia.
- Select compute resources based on workload characteristics like latency, throughput, and data type.
Key Terms & Glossary
- Sequential Processing: Executing tasks one after another; the hallmark of CPU architecture.
- Parallel Processing: Executing many calculations simultaneously; the core strength of GPUs.
- Throughput: The amount of data processed in a given time period (critical for training).
- Latency: The time taken to process a single request (critical for real-time inference).
- ASIC (Application-Specific Integrated Circuit): Custom silicon like Inferentia and Trainium designed for a single purpose (ML).
- Distributed Training: Splitting a large model or dataset across multiple compute nodes to reduce training time.
The "Big Idea"
Choosing a compute environment is a balancing act between cost, speed, and complexity. While a high-end GPU can handle almost any ML task, using one for simple linear regression is a waste of money. Conversely, training a LLM on a CPU is technically possible but practically impossible due to the time required. The goal is to match the mathematical complexity of the algorithm to the hardware's processing style.
Formula / Concept Box
| Decision Factor | Rule of Thumb |
|---|---|
| Preprocessing | Always use CPU (e.g., C5, M5) for logic-heavy data cleaning. |
| Deep Learning Training | Use GPU (P or G series) or Trainium (Trn1) for matrix multiplication. |
| Simple Inference | Use CPU or Inferentia to minimize costs if latency requirements are met. |
| High-Perf Inference | Use GPU (G4dn/G5) for real-time, unstructured data (images/video). |
Hierarchical Outline
- I. Processor Types
- A. CPU (Central Processing Unit)
- General purpose, complex logic, few cores.
- Best for Traditional ML (Random Forests, XGBoost) and Data Wrangling.
- B. GPU (Graphics Processing Unit)
- Specialized for math, thousands of small cores.
- Essential for Deep Learning and Computer Vision.
- C. AWS Specialized Silicon
- Trainium: Optimized for high-performance model training.
- Inferentia: Optimized for high-throughput, low-latency inference.
- A. CPU (Central Processing Unit)
- II. AWS Instance Families
- A. General Purpose/Compute Optimized: T2 (Burstable), C5/C6g (High CPU).
- B. Accelerated Computing:
- G-Series: Cost-effective GPUs for inference (e.g., G4dn, G5).
- P-Series: High-performance GPUs for training (e.g., P4d/P5).
- Trn/Inf: Custom AWS chips for specific ML lifecycle stages.
Visual Anchors
Compute Selection Flowchart
CPU vs. GPU Architecture
\begin{tikzpicture}[scale=0.8] % CPU \draw[thick] (0,0) rectangle (3,3) node[midway, above=1.5cm] {\textbf{CPU}}; \foreach \x in {0.5, 1.75} \foreach \y in {0.5, 1.75} \draw[fill=blue!20] (\x,\y) rectangle (\x+0.75,\y+0.75) node[midway, scale=0.6] {ALU};
% GPU
\draw[thick] (5,0) rectangle (8,3) node[midway, above=1.5cm] {\textbf{GPU}};
\foreach \x in {5.1, 5.4, 5.7, 6.0, 6.3, 6.6, 6.9, 7.2, 7.5}
\foreach \y in {0.1, 0.4, 0.7, 1.0, 1.3, 1.6, 1.9, 2.2, 2.5, 2.8}
\draw[fill=green!20] (\x,\y) rectangle (\x+0.2,\y+0.2);
\node[below, text width=3cm, align=center] at (1.5,-0.5) {Few complex cores\$Sequential)};
\node[below, text width=3cm, align=center] at (6.5,-0.5) {Thousands of simple cores\$Parallel)};\end{tikzpicture}
Definition-Example Pairs
- Burstable Performance: Instances that provide a baseline level of CPU with the ability to spike above that baseline.
- Example: A T3 instance used for a development environment where coding is intermittent, but testing requires a brief burst of power.
- Distributed Training: Spreading model weights and data across many GPUs to speed up convergence.
- Example: Training a Large Language Model (LLM) across a cluster of P5 instances connected by high-speed networking (EFA).
Worked Examples
Case 1: Real-time Image Classification
Requirement: A mobile app needs to identify plants from photos in under 200ms.
- Analyze: This is a Deep Learning inference task requiring low latency and high parallel math.
- Selection: G4dn or G5 instances are ideal because they feature NVIDIA GPUs optimized for inference.
- Alternative: Inf2 (Inferentia) could provide better price-performance if the model is compatible with the Neuron SDK.
Case 2: Monthly Batch Billing Analysis
Requirement: An ML model runs once a month to predict customer churn based on CSV files.
- Analyze: This is a batch process on structured data where latency is not a priority.
- Selection: C5 (Compute Optimized) or M5 (General Purpose) CPUs. The structured nature of the data does not benefit significantly from a GPU.
Checkpoint Questions
- Which processor type is best suited for feature engineering and data cleaning?
- Why would a developer choose an
Inf1instance over ap4dinstance for model hosting? - What is the primary difference between
Gseries andPseries instances in AWS? - For a small dataset using a Linear Regression model, is a GPU necessary? Why or why not?
Muddy Points & Cross-Refs
- GPU Underutilization: A common "muddy point" is deploying a small model on a massive GPU (like a P4), leading to high costs for idle hardware. Cross-ref: SageMaker Multi-Model Endpoints (MME) for better utilization.
- Neuron SDK: Using Trainium or Inferentia requires the AWS Neuron SDK, which differs from standard CUDA drivers used for NVIDIA GPUs. This adds a layer of software complexity.
Comparison Tables
| Feature | CPU (C5/M5) | GPU (G5/P4) | AWS Inferentia (Inf2) |
|---|---|---|---|
| Core Count | Low (8\u2013128) | Very High (Thousands) | Specialized (Neuron Cores) |
| Best Phase | Preprocessing/Inference | Training/Inference | Inference Only |
| Cost | Low | High | Moderate |
| Flexibility | Highest (Any code) | High (CUDA/ML Frameworks) | Lower (Requires Neuron SDK) |
| Data Type | Structured/Tabular | Unstructured (Image/Video) | Unstructured/LLMs |