AWS Purpose-Built Databases: Architectural Selection and Modernization
Purpose-built databases (for example, DynamoDB, Amazon Aurora Serverless, Amazon ElastiCache)
AWS Purpose-Built Databases: Architectural Selection and Modernization
This study guide focuses on the strategic selection and implementation of AWS purpose-built databases, a critical skill for the SAP-C02 exam. Modernizing applications often requires moving away from traditional "one-size-fits-all" relational databases to specialized engines optimized for specific data models and access patterns.
Learning Objectives
- Identify the characteristics and ideal use cases for AWS purpose-built database services.
- Differentiate between relational, key-value, document, graph, and in-memory data models.
- Apply architectural patterns to select the right database based on performance, scalability, and cost requirements.
- Evaluate modernization strategies, such as refactoring from proprietary engines to Aurora Serverless or NoSQL options.
Key Terms & Glossary
- ACID Compliance: Atomicity, Consistency, Isolation, Durability. Standard for relational databases to ensure data integrity.
- Base (Basically Available, Soft state, Eventual consistency): The consistency model often used by NoSQL databases like DynamoDB for high availability.
- Hot Partition: A performance bottleneck in NoSQL caused by uneven data distribution or access patterns.
- Polyglot Persistence: The architectural practice of using different database technologies for different data storage needs within a single application.
- Serverless Database: A database where the provider manages underlying capacity and scaling (e.g., Aurora Serverless, DynamoDB).
The "Big Idea"
The central philosophy of AWS database architecture is "The Right Tool for the Right Job." Instead of forcing complex relationships into a relational table or trying to perform full-text searches on a key-value store, architects should decompose workloads. By utilizing purpose-built databases, you can achieve millisecond latency at a scale that traditional RDBMS cannot support, while reducing operational overhead through managed and serverless offerings.
Formula / Concept Box
| Selection Metric | Priority for Relational (RDS/Aurora) | Priority for NoSQL (DynamoDB) |
|---|---|---|
| Data Structure | Highly Structured / Complex Joins | Semi-structured / Flat |
| Scaling | Vertical (mostly) | Horizontal (seamless) |
| Transactions | Complex Multi-table ACID | Single-digit millisecond single-item |
| Schema | Schema-on-write (Fixed) | Schema-on-read (Flexible) |
Hierarchical Outline
- Relational Databases
- Amazon RDS: Managed relational service supporting 6 engines (MySQL, PostgreSQL, etc.).
- Amazon Aurora: Cloud-native relational DB; 5x throughput of standard MySQL.
- Aurora Serverless: On-demand auto-scaling configuration for Aurora.
- Key-Value & Document
- Amazon DynamoDB: Serverless, single-digit millisecond performance at any scale.
- Amazon DocumentDB: Managed MongoDB-compatible service for JSON workloads.
- In-Memory & Performance
- Amazon ElastiCache: Managed Redis/Memcached for sub-millisecond caching.
- DynamoDB Accelerator (DAX): Dedicated in-memory cache for DynamoDB.
- Specialized Engines
- Amazon Neptune: Graph database for highly connected data (social networks).
- Amazon OpenSearch: Search and log analytics database.
- Amazon Timestream: Time-series database for IoT and operational telemetry.
Visual Anchors
Database Selection Logic
Polyglot Persistence Architecture
\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, fill=blue!10, text width=2.5cm, align=center, rounded corners}]
\node (app) {Application Layer (Lambda/ECS)};
\node (cache) [right of=app, xshift=3cm, fill=orange!20] {ElastiCache\$Session/Results)};
\node (db1) [below left of=app, yshift=-1cm, fill=green!10] {Aurora\$Order History)};
\node (db2) [below of=app, yshift=-1cm, fill=yellow!10] {DynamoDB\$Product Catalog)};
\node (db3) [below right of=app, yshift=-1cm, fill=purple!10] {Neptune\$Recommendations)};
\draw[->, thick] (app) -- (cache);
\draw[->, thick] (app) -- (db1);
\draw[->, thick] (app) -- (db2);
\draw[->, thick] (app) -- (db3);\end{tikzpicture}
Definition-Example Pairs
- Graph Database (Amazon Neptune): A database that uses graph structures for queries with nodes, edges, and properties.
- Example: A social media "friend-of-friend" recommendation engine that needs to traverse millions of relationships instantly.
- In-Memory Database (Amazon ElastiCache): A data store that resides primarily in the main memory (RAM) to provide ultra-fast access.
- Example: Storing a real-time gaming leaderboard where scores change every second and thousands of players read the top 10 list.
- Time-Series Database (Amazon Timestream): Optimized for tracking data points that change over time.
- Example: Monitoring CPU utilization and memory metrics across 10,000 EC2 instances for DevOps analysis.
Worked Examples
Case 1: Migrating a Massive JSON Catalog
Scenario: A retail company uses an on-premises MongoDB cluster to store product metadata. They struggle with scaling during Black Friday. Solution:
- Select Amazon DocumentDB: It is MongoDB-compatible, allowing developers to use existing drivers and code.
- Modernization: Use the AWS Database Migration Service (DMS) to move data with minimal downtime.
- Result: The architecture becomes serverless-ready, allowing storage to scale independently from compute.
Case 2: Optimizing DynamoDB Costs
Scenario: A mobile app uses DynamoDB. During peak hours, costs spike due to high Read Capacity Unit (RCU) consumption on a single frequently-accessed item (hot key). Solution:
- Implement DAX (DynamoDB Accelerator): Add a DAX cluster in front of the DynamoDB table.
- Result: Frequent reads are served from the in-memory cache, reducing the RCUs required on the base table and lowering latency to microseconds.
Checkpoint Questions
- Which database should be used to find patterns in highly connected datasets like social network links?
- True or False: Aurora Serverless v2 is better suited for stable, predictable workloads than Aurora Provisioned.
- What is the primary difference between ElastiCache Redis and Memcached regarding data persistence?
- When would you choose Amazon OpenSearch over Amazon DynamoDB?
[!NOTE] Answers: 1. Amazon Neptune. 2. False (v2 is for variable/unpredictable workloads). 3. Redis supports snapshots and data persistence; Memcached is strictly non-persistent. 4. Use OpenSearch for complex full-text search and log analytics; use DynamoDB for structured key-value lookups.
Muddy Points & Cross-Refs
- Aurora Serverless v1 vs. v2: v1 scales to zero (good for dev/test); v2 scales in smaller increments and is much faster (suitable for production).
- RDS vs. Aurora: Aurora is AWS-proprietary and highly optimized for the cloud; RDS is "managed community engines."
- Storage Limits: Remember that DynamoDB has a 400KB item limit. For larger objects, store the metadata in DynamoDB and the object in Amazon S3.
Comparison Tables
| Feature | DynamoDB | Amazon Aurora | Amazon Neptune |
|---|---|---|---|
| Model | Key-Value / Document | Relational | Graph |
| Scaling | Highly Horizontal | Vertical/Read Replicas | Read Replicas |
| Primary Metric | Throughput (WCU/RCU) | Compute (ACU/Instances) | Instances |
| Query Language | PartiQL / API | SQL | Gremlin / SPARQL |
| Best For | IoT, Mobile, Web | ERP, CRM, Finance | Social, Fraud Detection |