Optimizing AWS Infrastructure Costs: Purchasing Options for ML Workloads

This guide covers the strategic selection of AWS purchasing options to minimize the cost of Machine Learning (ML) infrastructure while maintaining performance and availability requirements.

Learning Objectives

After studying this guide, you should be able to:

Differentiate between the five primary AWS purchasing models (On-Demand, Spot, Reserved, Savings Plans, and Capacity Blocks).
Select the appropriate purchasing option based on workload predictability and fault tolerance.
Explain the specific benefits and coverage of Amazon SageMaker Savings Plans.
Identify the best tools for monitoring and forecasting ML-related infrastructure costs.

Key Terms & Glossary

On-Demand Instances: Pay-for-use compute capacity with no long-term commitment.
- Example: Launching an ml.p3.2xlarge instance for a quick two-hour experimentation session.
Spot Instances: Spare AWS capacity available at up to 90% discount, but subject to reclamation by AWS with a 2-minute warning.
- Example: Running a large-scale batch data-cleansing job that can restart if interrupted.
Savings Plans: A flexible pricing model that offers low prices in exchange for a commitment to a consistent amount of usage (measured in $/hour) for a 1- or 3-year term.
Reserved Instances (RI): A commitment to a specific instance type in a specific region for a set term, providing significant discounts for steady-state workloads.
Capacity Blocks: A specialized option to reserve GPU instances for a specific duration to ensure availability for high-demand tasks like model fine-tuning.

The "Big Idea"

Cost optimization in AWS ML is the art of balancing Elasticity (the ability to scale up/down instantly) against Commitment (trading flexibility for lower rates). While On-Demand instances provide maximum flexibility, they are the most expensive. By forecasting usage and identifying which workloads can tolerate interruptions, engineers can drastically reduce the "Total Cost of Ownership" (TCO) of their ML lifecycle.

Formula / Concept Box

Concept	Metric / Rule	Key Savings Estimate
Spot Savings	(On-Demand Price - Spot Price) / On-Demand Price	Up to 90%
Savings Plan Term	1 Year or 3 Year Commitment	Up to 64% (SageMaker)
Utilization Rate	(Actual Usage / Committed Usage) * 100	Aim for > 90% for RIs/SP
Payment Options	All Upfront, Partial Upfront, No Upfront	Higher upfront = Higher discount

Hierarchical Outline

I. Flexible / Unpredictable Workloads
- On-Demand Instances: Best for short-term, irregular, or experimental tasks.
- Spot Instances: Best for fault-tolerant and interruptible tasks (e.g., training with checkpoints).
II. Predictable / Steady-State Workloads
- Savings Plans: The modern standard for flexibility. Includes Compute and SageMaker specific plans.
- Reserved Instances: Legacy model, tied to specific instance families/regions.
III. Specialized GPU Requirements
- Capacity Blocks: Time-bound reservations for high-demand GPUs (e.g., H100s) to prevent "Insufficient Capacity" errors.
IV. Cost Management Tools
- AWS Cost Explorer: Visualizes historical spending and forecasts future costs.
- AWS Budgets: Sets alerts for when actual or forecasted spending exceeds thresholds.

Visual Anchors

Purchasing Decision Flowchart

Loading Diagram...

Cost vs. Commitment Trade-off

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

SageMaker Savings Plans: A commitment to spend a specific dollar amount per hour on SageMaker services (Notebooks, Training, Inference).
- Example: A company commits to $10/hour. They can use any instance type (CPU or GPU) in any region, and the discount automatically applies to the first $10 of usage every hour.
Interruptible Workload: A task that can be paused and restarted without losing significant progress.
- Example: A SageMaker Training job that uses Managed Spot Training with periodic checkpoints saved to S3.
Capacity Blocks: Reserving a cluster of GPUs for a specific start and end time.
- Example: Reserving 8 p5.48xlarge instances for precisely 48 hours starting next Tuesday to perform a final LLM fine-tune.

Worked Examples

Scenario: Comparing On-Demand vs. Savings Plan

Problem: A startup uses an ml.m5.2xlarge instance for real-time inference 24/7. The On-Demand price is $0.46/hour. A 1-year SageMaker Savings Plan offers a 20% discount.

Step 1: Calculate Monthly On-Demand Cost $0.46 \text{ USD/hr} \times 24 \text{ hrs} \times 30 \text{ days} = $331.20$

Step 2: Calculate Monthly Savings Plan Cost $0.46 \times (1 - 0.20) = $0.368 \text{ USD/hr}$ $$0.368 \times 24 \times 30 = $264.96$

Step 3: Total Savings $$331.20 - $264.96 = $66.24 \text{ per month}$

Checkpoint Questions

Which purchasing option provides the highest potential discount (up to 90%) but carries the risk of the instance being terminated?
True or False: SageMaker Savings Plans apply to both SageMaker Training jobs and SageMaker Notebook instances.
What tool would you use to set an SMS alert if your ML training costs are predicted to exceed $500 this month?
Why are Capacity Blocks preferred over On-Demand for high-end GPU training tasks involving multiple nodes?

Muddy Points & Cross-Refs

Reserved Instances (RI) vs. Savings Plans (SP): RIs are older and generally less flexible (often tied to a specific instance type). SPs are the recommended modern choice for most ML engineers because they provide flexibility across instance families and regions.
Spot Interruption Handling: Remember that when using Spot for training, you must implement checkpointing. If you don't, you lose all progress when the instance is reclaimed.
Deep Dive: See AWS Cost Explorer documentation for how to use "Rightsizing Recommendations" to identify over-provisioned ML instances.

Comparison Tables

Option	Best Use Case	Risk	Discount Level
On-Demand	Spiky, unpredictable usage	None (Highest Cost)	0% (Baseline)
Spot	Training with checkpoints	2-minute interruption notice	~70-90%
Savings Plans	Steady-state production models	Underutilization if usage drops	~20-64%
Capacity Blocks	Short-term, intensive GPU tasks	Must pay for the whole block	Significant

[!TIP] Use Amazon SageMaker Inference Recommender before committing to a Savings Plan to ensure you are using the most cost-effective instance type for your model performance requirements.