Optimizing Compute Costs: Evaluating Spot Instances and Savings Plans
Evaluate workloads for EC2 Spot Instance and Savings Plans eligibility
Optimizing Compute Costs: Evaluating Spot Instances and Savings Plans
This guide covers the evaluation of AWS workloads to determine the most cost-effective purchasing model, specifically focusing on the eligibility and trade-offs of Spot Instances and Savings Plans.
Learning Objectives
After studying this guide, you should be able to:
- Differentiate between On-Demand, Spot, and Savings Plans pricing models.
- Identify workloads eligible for Spot Instances based on fault tolerance.
- Determine when a Savings Plan commitment is superior to Reserved Instances.
- Evaluate steady-state vs. bursty workloads for financial optimization.
Key Terms & Glossary
- Spot Instance: An instance that uses spare AWS capacity at a discount of up to 90%, subject to reclamation by AWS with a 2-minute warning.
- Savings Plan: A flexible pricing model offering up to 72% savings in exchange for a commitment to a consistent amount of compute usage (measured in /hour) for a 1 or 3-year term.
- Steady-State Workload: A workload with predictable, constant resource requirements over time.
- Fault-Tolerant: The ability of a system to continue operating properly in the event of the failure of some of its components (critical for Spot eligibility).
- Capacity Rebalancing: A feature that proactively manages Spot Instance interruptions by signaling Auto Scaling to replace instances before they are reclaimed.
The "Big Idea"
In AWS, you pay for the flexibility you require. On-Demand pricing is the "retail price" for maximum flexibility. To reduce costs, you must trade either persistence (by using Spot Instances, which can be interrupted) or liquidity (by using Savings Plans, which require a long-term financial commitment). Evaluating a workload involves finding the highest possible "trade" the application can afford without compromising its core business function.
Formula / Concept Box
| Pricing Model | Max Savings | Commitment | Best For... |
|---|---|---|---|
| On-Demand | 0% (Base) | None | New, unpredictable, or non-interruptible workloads. |
| Savings Plans | ~72% | 1 or 3 Years | Steady-state, predictable usage (EC2, Fargate, Lambda). |
| Spot Instances | ~90% | None | Stateless, fault-tolerant, or time-flexible tasks. |
| Reserved (RI) | ~72% | 1 or 3 Years | Legacy environments (replaced largely by Savings Plans). |
[!IMPORTANT] Savings Plans do not apply to Spot Instance usage. You cannot "stack" these discounts.
Hierarchical Outline
- On-Demand Instances
- Baseline usage: Best for short-term, spikey, or testing workloads.
- No Interruption: Guaranteed to stay running until you stop them.
- Spot Instances (The "Spare Capacity" Model)
- Cost: Lowest price point (up to 90% off).
- Interruptibility: AWS can reclaim with 2-minute notice.
- Use Cases: Big data (worker nodes), CI/CD, batch processing, HPC.
- Savings Plans (The "Commitment" Model)
- Compute Savings Plan: Most flexible; applies to EC2, Fargate, and Lambda across regions.
- EC2 Instance Savings Plan: Up to 72% off; locked to a specific instance family in a region.
- SageMaker Savings Plan: Specific to machine learning workloads.
- Evaluation Criteria
- Can it be paused? \rightarrow Use Spot.
- Is it running 24/7? \rightarrow Use Savings Plans.
- Is it a new proof-of-concept? \rightarrow$ Use On-Demand.
Visual Anchors
Workload Eligibility Decision Tree
Cost vs. Flexibility Trade-off
\begin{tikzpicture}[scale=0.8] \draw[thick,->] (0,0) -- (8,0) node[right] {Commitment/Risk}; \draw[thick,->] (0,0) -- (0,6) node[above] {Cost (Price/Hour)};
% On-Demand
\filldraw[blue!20, draw=blue!50] (0.5,5) circle (0.4cm) node[black, below=0.5cm] {\tiny On-Demand};
\node at (0.5, 5.2) {\tiny High Cost / Low Risk};
% Savings Plan
\filldraw[green!20, draw=green!50] (4,2.5) circle (0.4cm) node[black, below=0.5cm] {\tiny Savings Plan};
\node at (4, 2.7) {\tiny Med Cost / Long-term Risk};
% Spot
\filldraw[red!20, draw=red!50] (7,1) circle (0.4cm) node[black, below=0.5cm] {\tiny Spot};
\node at (7, 1.2) {\tiny Low Cost / Interruption Risk};
\draw[dashed, gray] (0.5,5) -- (7,1);\end{tikzpicture}
Definition-Example Pairs
- Stateless Workload: A process that does not store data locally between requests.
- Example: A fleet of web servers where any server can handle any request; if one is reclaimed (Spot), the user session is managed by a central database or load balancer.
- Fault-Tolerant Application: An application designed to recover gracefully from infrastructure failure.
- Example: A video transcoding job that processes files in small segments. If the instance is interrupted, the job simply restarts that specific 10-second segment on a new instance.
- Flexible Compute: A workload that can run on various instance types or sizes.
- Example: A Hadoop worker node that can function equally well on
m5.largeorr5.largeinstances, making it an ideal candidate for Compute Savings Plans.
- Example: A Hadoop worker node that can function equally well on
Worked Examples
Scenario 1: The Constant Web Portal
Workload: An internal HR portal running on 4 m5.large instances 24/7. It must always be available.
- Eligibility Evaluation:
- Can it be interrupted? No (Business critical) $\rightarrow Eliminate Spot.
- Is it steady-state? Yes (24/7 usage) \rightarrow$ Eligible for Savings Plans.
- Recommendation: Purchase a 3-year EC2 Instance Savings Plan for
m5in the specific region to maximize the ~72% discount.
Scenario 2: Large-Scale Genomic Sequencing
Workload: A research lab needs to process 1,000 TB of data. The job can take 48 hours to 1 week. If a server stops, the job can resume from a checkpoint.
- Eligibility Evaluation:
- Can it be interrupted? Yes (Check-pointing exists) Eligible for Spot.
- Is it cost-sensitive? Yes (Massive scale).
- Recommendation: Use Spot Instances with a Spot Fleet configured for multiple instance families to ensure high availability of spare capacity while saving 90%.
Checkpoint Questions
- Which Savings Plan type allows you to change instance families (e.g., move from C5 to M5) while keeping your discount?
- Answer: Compute Savings Plans.
- How much notice does AWS provide before reclaiming a Spot Instance?
- Answer: Two minutes.
- True or False: Savings Plans can apply to AWS Lambda and AWS Fargate usage.
- Answer: True.
- A company has a workload that is unpredictable and cannot be interrupted. Which pricing model should they use?
- Answer: On-Demand.
- What Spot feature helps Auto Scaling proactively replace instances at risk of interruption?
- Answer: Capacity Rebalancing.