Optimizing Compute Utilization: Containers, Serverless, and Microservices

This guide explores how to maximize performance and cost-efficiency within AWS by selecting and tuning the right compute environments, ranging from traditional virtual machines to ultra-lightweight serverless functions.

Learning Objectives

After studying this guide, you should be able to:

Compare and contrast EC2, ECS, EKS, and AWS Lambda based on workload requirements.
Evaluate the benefits of AWS Fargate for container orchestration without managing underlying infrastructure.
Identify optimal scaling strategies (Horizontal vs. Vertical) and implementation methods like Target Tracking.
Utilize AWS Compute Optimizer to right-size resources based on historical utilization data.
Design decoupled, microservices-based architectures that allow independent scaling of components.

Key Terms & Glossary

Compute Utilization: The ratio of used compute resources (CPU, RAM) to the total capacity provisioned.
Serverless: A cloud execution model where the provider manages the server allocation and infrastructure, charging only for resources used during execution (e.g., AWS Lambda).
Container: A standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
Orchestration: The automated arrangement, coordination, and management of complex computer systems and services (e.g., Amazon ECS or EKS).
Microservices: An architectural style that structures an application as a collection of small, autonomous services modeled around a business domain.
Cold Start: The latency experienced when a serverless function is triggered for the first time or after a period of inactivity, as the cloud provider provisions the environment.

The "Big Idea"

The fundamental goal of compute optimization is to maximize server density while minimizing idle waste. In traditional environments, servers often run at 10-20% capacity to handle occasional peaks. By moving toward containers and serverless, we shift from "buying a box" to "buying a process." This allows us to pack multiple workloads onto single instances or trigger compute only when an event occurs, ensuring every cent of spend translates directly into application performance.

Formula / Concept Box

Concept	Rule / Definition	Key Metric/Limit
Vertical Scaling	"Scaling Up": Adding more CPU/RAM to an existing instance.	Max instance size (e.g., u-24tb1.112xlarge)
Horizontal Scaling	"Scaling Out": Adding more instances to a pool.	Auto Scaling Group (ASG) Max Size
Lambda Duration	Maximum execution time for a single function.	15 Minutes
Compute Optimizer	Look-back period for analyzing resource patterns.	14 Days
Target Tracking	Policy that adjusts capacity based on a specific metric value.	e.g., Keep average CPU at 65%

Hierarchical Outline

Foundational Compute (EC2)
- Instance Families: Matching workloads to types (e.g., M5 for general purpose, C5 for compute-intensive, R5 for memory-intensive).
- Purchasing Options: Using Spot Instances for stateless/interruptible tasks to save up to 90%, and Reserved Instances (RIs) for baseline 24/7 loads.
Containerization (Docker on AWS)
- Amazon ECS: AWS-native container orchestration; simpler integration with AWS services.
- Amazon EKS: Managed Kubernetes; best for hybrid clouds or standardizing on K8s open-source tooling.
- AWS Fargate: The "serverless" way to run containers; removes the need to manage EC2 instances for the cluster.
Serverless Computing (AWS Lambda)
- Event-Driven: Triggered by S3 uploads, DynamoDB changes, or API Gateway requests.
- Resource Tuning: Performance is controlled by Memory allocation; CPU power scales proportionally with memory.
Optimization Tools
- AWS Compute Optimizer: Uses Machine Learning to recommend rightsizing for EC2, EBS, and Lambda.
- Auto Scaling: Automates the "Horizontal Scaling" process to match demand.

Visual Anchors

Choosing the Right Compute

Loading Diagram...

VM vs. Container Architecture

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Decoupled Architecture: A design where components are independent, so a failure or scale-up in one doesn't mandate the same in others.
- Example: Using Amazon SQS between a web tier and a processing tier. If the processing tier is slow, messages just wait in the queue without crashing the web tier.
Stateless Workload: A process that does not store client data locally on the server; every request contains all the information needed to complete it.
- Example: A web server that stores user session data in Amazon ElastiCache (Redis) instead of on the local C: drive.
Right-Sizing: The process of matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost.
- Example: Changing an m5.2xlarge (always at 5% CPU) to a t3.medium after seeing recommendations in AWS Compute Optimizer.

Worked Examples

Example 1: Web Application Scaling

Problem: A marketing site experiences massive traffic spikes at 9 AM and very little traffic at night. Using a single large instance is expensive and risky. Solution:

Create an Amazon Machine Image (AMI) of the web server.
Set up an Auto Scaling Group (ASG) with a Minimum of 2 (for High Availability) and a Maximum of 10.
Configure a Target Tracking Policy set to a Target Value of 70% CPU Utilization.
Result: At 9 AM, ASG detects CPU rising and adds instances. At night, it removes them, saving money.

Example 2: Serverless Image Processing

Problem: Users upload high-resolution photos to S3. These need to be resized into thumbnails. Solution:

Create an AWS Lambda function with code to resize images.
Set an S3 Event Trigger on the "uploads/" bucket for ObjectCreated events.
When a file arrives, S3 invokes the Lambda. The Lambda runs for 3 seconds, resizes the image, and exits.
Result: You pay only for the 3 seconds of execution. There is zero cost when no images are being uploaded.

Checkpoint Questions

Which AWS service would you use to find out if your EC2 instances are over-provisioned based on the last two weeks of data?
If an application requires a sub-second boot time to handle unpredictable bursts of traffic, should you choose EC2 or AWS Lambda?
What is the main difference between horizontal scaling and vertical scaling?
True or False: AWS Fargate requires you to manage the underlying EC2 instances and patch the operating system.
How long can an AWS Lambda function run before it times out?

[!TIP] Answer Key:

AWS Compute Optimizer.

AWS Lambda (much faster cold start than an EC2 boot).

Horizontal adds more instances; Vertical makes existing instances bigger.

False (Fargate is serverless; AWS manages the underlying infrastructure).

15 Minutes.