Study Guide945 words

Optimizing Compute Costs: Evaluating Spot Instances and Savings Plans

Evaluate workloads for EC2 Spot Instance and Savings Plans eligibility

Optimizing Compute Costs: Evaluating Spot Instances and Savings Plans

This guide covers the evaluation of AWS workloads to determine the most cost-effective purchasing model, specifically focusing on the eligibility and trade-offs of Spot Instances and Savings Plans.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between On-Demand, Spot, and Savings Plans pricing models.
  • Identify workloads eligible for Spot Instances based on fault tolerance.
  • Determine when a Savings Plan commitment is superior to Reserved Instances.
  • Evaluate steady-state vs. bursty workloads for financial optimization.

Key Terms & Glossary

  • Spot Instance: An instance that uses spare AWS capacity at a discount of up to 90%, subject to reclamation by AWS with a 2-minute warning.
  • Savings Plan: A flexible pricing model offering up to 72% savings in exchange for a commitment to a consistent amount of compute usage (measured in /hour) for a 1 or 3-year term.
  • Steady-State Workload: A workload with predictable, constant resource requirements over time.
  • Fault-Tolerant: The ability of a system to continue operating properly in the event of the failure of some of its components (critical for Spot eligibility).
  • Capacity Rebalancing: A feature that proactively manages Spot Instance interruptions by signaling Auto Scaling to replace instances before they are reclaimed.

The "Big Idea"

In AWS, you pay for the flexibility you require. On-Demand pricing is the "retail price" for maximum flexibility. To reduce costs, you must trade either persistence (by using Spot Instances, which can be interrupted) or liquidity (by using Savings Plans, which require a long-term financial commitment). Evaluating a workload involves finding the highest possible "trade" the application can afford without compromising its core business function.

Formula / Concept Box

Pricing ModelMax SavingsCommitmentBest For...
On-Demand0% (Base)NoneNew, unpredictable, or non-interruptible workloads.
Savings Plans~72%1 or 3 YearsSteady-state, predictable usage (EC2, Fargate, Lambda).
Spot Instances~90%NoneStateless, fault-tolerant, or time-flexible tasks.
Reserved (RI)~72%1 or 3 YearsLegacy environments (replaced largely by Savings Plans).

[!IMPORTANT] Savings Plans do not apply to Spot Instance usage. You cannot "stack" these discounts.

Hierarchical Outline

  1. On-Demand Instances
    • Baseline usage: Best for short-term, spikey, or testing workloads.
    • No Interruption: Guaranteed to stay running until you stop them.
  2. Spot Instances (The "Spare Capacity" Model)
    • Cost: Lowest price point (up to 90% off).
    • Interruptibility: AWS can reclaim with 2-minute notice.
    • Use Cases: Big data (worker nodes), CI/CD, batch processing, HPC.
  3. Savings Plans (The "Commitment" Model)
    • Compute Savings Plan: Most flexible; applies to EC2, Fargate, and Lambda across regions.
    • EC2 Instance Savings Plan: Up to 72% off; locked to a specific instance family in a region.
    • SageMaker Savings Plan: Specific to machine learning workloads.
  4. Evaluation Criteria
    • Can it be paused? \rightarrow Use Spot.
    • Is it running 24/7? \rightarrow Use Savings Plans.
    • Is it a new proof-of-concept? \rightarrow$ Use On-Demand.

Visual Anchors

Workload Eligibility Decision Tree

Loading Diagram...

Cost vs. Flexibility Trade-off

\begin{tikzpicture}[scale=0.8] \draw[thick,->] (0,0) -- (8,0) node[right] {Commitment/Risk}; \draw[thick,->] (0,0) -- (0,6) node[above] {Cost (Price/Hour)};

code
% On-Demand \filldraw[blue!20, draw=blue!50] (0.5,5) circle (0.4cm) node[black, below=0.5cm] {\tiny On-Demand}; \node at (0.5, 5.2) {\tiny High Cost / Low Risk}; % Savings Plan \filldraw[green!20, draw=green!50] (4,2.5) circle (0.4cm) node[black, below=0.5cm] {\tiny Savings Plan}; \node at (4, 2.7) {\tiny Med Cost / Long-term Risk}; % Spot \filldraw[red!20, draw=red!50] (7,1) circle (0.4cm) node[black, below=0.5cm] {\tiny Spot}; \node at (7, 1.2) {\tiny Low Cost / Interruption Risk}; \draw[dashed, gray] (0.5,5) -- (7,1);

\end{tikzpicture}

Definition-Example Pairs

  • Stateless Workload: A process that does not store data locally between requests.
    • Example: A fleet of web servers where any server can handle any request; if one is reclaimed (Spot), the user session is managed by a central database or load balancer.
  • Fault-Tolerant Application: An application designed to recover gracefully from infrastructure failure.
    • Example: A video transcoding job that processes files in small segments. If the instance is interrupted, the job simply restarts that specific 10-second segment on a new instance.
  • Flexible Compute: A workload that can run on various instance types or sizes.
    • Example: A Hadoop worker node that can function equally well on m5.large or r5.large instances, making it an ideal candidate for Compute Savings Plans.

Worked Examples

Scenario 1: The Constant Web Portal

Workload: An internal HR portal running on 4 m5.large instances 24/7. It must always be available.

  • Eligibility Evaluation:
    1. Can it be interrupted? No (Business critical) $\rightarrow Eliminate Spot.
    2. Is it steady-state? Yes (24/7 usage) \rightarrow$ Eligible for Savings Plans.
  • Recommendation: Purchase a 3-year EC2 Instance Savings Plan for m5 in the specific region to maximize the ~72% discount.

Scenario 2: Large-Scale Genomic Sequencing

Workload: A research lab needs to process 1,000 TB of data. The job can take 48 hours to 1 week. If a server stops, the job can resume from a checkpoint.

  • Eligibility Evaluation:
    1. Can it be interrupted? Yes (Check-pointing exists) \rightarrow Eligible for Spot.
    2. Is it cost-sensitive? Yes (Massive scale).
  • Recommendation: Use Spot Instances with a Spot Fleet configured for multiple instance families to ensure high availability of spare capacity while saving 90%.

Checkpoint Questions

  1. Which Savings Plan type allows you to change instance families (e.g., move from C5 to M5) while keeping your discount?
    • Answer: Compute Savings Plans.
  2. How much notice does AWS provide before reclaiming a Spot Instance?
    • Answer: Two minutes.
  3. True or False: Savings Plans can apply to AWS Lambda and AWS Fargate usage.
    • Answer: True.
  4. A company has a workload that is unpredictable and cannot be interrupted. Which pricing model should they use?
    • Answer: On-Demand.
  5. What Spot feature helps Auto Scaling proactively replace instances at risk of interruption?
    • Answer: Capacity Rebalancing.

Ready to study AWS Certified CloudOps Engineer - Associate (SOA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free