AWS Purpose-Built Databases: Architectural Selection and Modernization

This study guide focuses on the strategic selection and implementation of AWS purpose-built databases, a critical skill for the SAP-C02 exam. Modernizing applications often requires moving away from traditional "one-size-fits-all" relational databases to specialized engines optimized for specific data models and access patterns.

Learning Objectives

Identify the characteristics and ideal use cases for AWS purpose-built database services.
Differentiate between relational, key-value, document, graph, and in-memory data models.
Apply architectural patterns to select the right database based on performance, scalability, and cost requirements.
Evaluate modernization strategies, such as refactoring from proprietary engines to Aurora Serverless or NoSQL options.

Key Terms & Glossary

ACID Compliance: Atomicity, Consistency, Isolation, Durability. Standard for relational databases to ensure data integrity.
Base (Basically Available, Soft state, Eventual consistency): The consistency model often used by NoSQL databases like DynamoDB for high availability.
Hot Partition: A performance bottleneck in NoSQL caused by uneven data distribution or access patterns.
Polyglot Persistence: The architectural practice of using different database technologies for different data storage needs within a single application.
Serverless Database: A database where the provider manages underlying capacity and scaling (e.g., Aurora Serverless, DynamoDB).

The "Big Idea"

The central philosophy of AWS database architecture is "The Right Tool for the Right Job." Instead of forcing complex relationships into a relational table or trying to perform full-text searches on a key-value store, architects should decompose workloads. By utilizing purpose-built databases, you can achieve millisecond latency at a scale that traditional RDBMS cannot support, while reducing operational overhead through managed and serverless offerings.

Formula / Concept Box

Selection Metric	Priority for Relational (RDS/Aurora)	Priority for NoSQL (DynamoDB)
Data Structure	Highly Structured / Complex Joins	Semi-structured / Flat
Scaling	Vertical (mostly)	Horizontal (seamless)
Transactions	Complex Multi-table ACID	Single-digit millisecond single-item
Schema	Schema-on-write (Fixed)	Schema-on-read (Flexible)

Hierarchical Outline

Relational Databases
- Amazon RDS: Managed relational service supporting 6 engines (MySQL, PostgreSQL, etc.).
- Amazon Aurora: Cloud-native relational DB; 5x throughput of standard MySQL.
- Aurora Serverless: On-demand auto-scaling configuration for Aurora.
Key-Value & Document
- Amazon DynamoDB: Serverless, single-digit millisecond performance at any scale.
- Amazon DocumentDB: Managed MongoDB-compatible service for JSON workloads.
In-Memory & Performance
- Amazon ElastiCache: Managed Redis/Memcached for sub-millisecond caching.
- DynamoDB Accelerator (DAX): Dedicated in-memory cache for DynamoDB.
Specialized Engines
- Amazon Neptune: Graph database for highly connected data (social networks).
- Amazon OpenSearch: Search and log analytics database.
- Amazon Timestream: Time-series database for IoT and operational telemetry.

Visual Anchors

Database Selection Logic

Loading Diagram...

Polyglot Persistence Architecture

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Graph Database (Amazon Neptune): A database that uses graph structures for queries with nodes, edges, and properties.
- Example: A social media "friend-of-friend" recommendation engine that needs to traverse millions of relationships instantly.
In-Memory Database (Amazon ElastiCache): A data store that resides primarily in the main memory (RAM) to provide ultra-fast access.
- Example: Storing a real-time gaming leaderboard where scores change every second and thousands of players read the top 10 list.
Time-Series Database (Amazon Timestream): Optimized for tracking data points that change over time.
- Example: Monitoring CPU utilization and memory metrics across 10,000 EC2 instances for DevOps analysis.

Worked Examples

Case 1: Migrating a Massive JSON Catalog

Scenario: A retail company uses an on-premises MongoDB cluster to store product metadata. They struggle with scaling during Black Friday. Solution:

Select Amazon DocumentDB: It is MongoDB-compatible, allowing developers to use existing drivers and code.
Modernization: Use the AWS Database Migration Service (DMS) to move data with minimal downtime.
Result: The architecture becomes serverless-ready, allowing storage to scale independently from compute.

Case 2: Optimizing DynamoDB Costs

Scenario: A mobile app uses DynamoDB. During peak hours, costs spike due to high Read Capacity Unit (RCU) consumption on a single frequently-accessed item (hot key). Solution:

Implement DAX (DynamoDB Accelerator): Add a DAX cluster in front of the DynamoDB table.
Result: Frequent reads are served from the in-memory cache, reducing the RCUs required on the base table and lowering latency to microseconds.

Checkpoint Questions

Which database should be used to find patterns in highly connected datasets like social network links?
True or False: Aurora Serverless v2 is better suited for stable, predictable workloads than Aurora Provisioned.
What is the primary difference between ElastiCache Redis and Memcached regarding data persistence?
When would you choose Amazon OpenSearch over Amazon DynamoDB?

[!NOTE] Answers: 1. Amazon Neptune. 2. False (v2 is for variable/unpredictable workloads). 3. Redis supports snapshots and data persistence; Memcached is strictly non-persistent. 4. Use OpenSearch for complex full-text search and log analytics; use DynamoDB for structured key-value lookups.

Muddy Points & Cross-Refs

Aurora Serverless v1 vs. v2: v1 scales to zero (good for dev/test); v2 scales in smaller increments and is much faster (suitable for production).
RDS vs. Aurora: Aurora is AWS-proprietary and highly optimized for the cloud; RDS is "managed community engines."
Storage Limits: Remember that DynamoDB has a 400KB item limit. For larger objects, store the metadata in DynamoDB and the object in Amazon S3.

Comparison Tables

Feature	DynamoDB	Amazon Aurora	Amazon Neptune
Model	Key-Value / Document	Relational	Graph
Scaling	Highly Horizontal	Vertical/Read Replicas	Read Replicas
Primary Metric	Throughput (WCU/RCU)	Compute (ACU/Instances)	Instances
Query Language	PartiQL / API	SQL	Gremlin / SPARQL
Best For	IoT, Mobile, Web	ERP, CRM, Finance	Social, Fraud Detection