Comprehensive Study Guide: Purpose-Built AWS Databases
Purpose-built databases
Comprehensive Study Guide: Purpose-Built AWS Databases
This guide covers the identification and selection of purpose-built AWS databases, a critical domain for the AWS Certified Solutions Architect - Professional (SAP-C02) exam. We explore moving from monolithic RDBMS to specialized engines optimized for specific workloads.
Learning Objectives
- Identify opportunities for refactoring and rearchitecting monolithic databases.
- Evaluate the trade-offs between Amazon Aurora, DynamoDB, DocumentDB, and Redshift.
- Determine the appropriate migration tools (DMS, SCT, Babelfish) for various database types.
- Understand the performance and cost benefits of moving away from proprietary licensing models.
Key Terms & Glossary
- Purpose-Built Database: A database designed for a specific data model or access pattern rather than a general-purpose "one-size-fits-all" approach.
- Refactoring: A heterogeneous migration that replaces a proprietary engine with a cloud-native or open-source equivalent while maintaining similar logic (e.g., SQL Server to Amazon Aurora).
- Rearchitecting: A more invasive migration involving changing the underlying data model (e.g., Relational to NoSQL) and modifying application code.
- Columnar Storage: A data storage format where data is stored by columns instead of rows, optimized for analytical queries (OLAP).
- Babelfish: A capability for Amazon Aurora PostgreSQL that enables it to understand commands from applications written for Microsoft SQL Server.
The "Big Idea"
Modern application architecture favors decoupling and specialization. Instead of forcing all data—relational, document, and analytical—into a single SQL Server or Oracle instance, architects select the engine that provides the best performance and cost-efficiency for a specific task. This shift reduces "licensing gravity" and enables massive scaling that monolithic systems cannot achieve.
Formula / Concept Box
| Migration Type | Strategy | Application Changes | Target Examples |
|---|---|---|---|
| Refactoring | Heterogeneous SQL to SQL | Minimal to none | Amazon Aurora, Amazon RDS |
| Rearchitecting | SQL to NoSQL / Document | Extensive (CRUD overhaul) | DynamoDB, DocumentDB |
| Modernizing | SQL to Data Warehouse | Significant (Schema redesign) | Amazon Redshift |
Hierarchical Outline
- I. Relational Modernization (Refactoring)
- Amazon Aurora: MySQL/PostgreSQL compatible, cloud-native performance.
- Open Source Options: RDS MySQL, PostgreSQL, MariaDB for license savings.
- Migration Tools: AWS DMS + Schema Conversion Tool (SCT).
- II. NoSQL & Document Specialization (Rearchitecting)
- Amazon DynamoDB: Key-value pairs, millisecond latency at scale.
- Amazon DocumentDB: JSON document storage, MongoDB compatibility.
- III. Analytical & Specialized Engines
- Amazon Redshift: Columnar data warehousing for OLAP workloads.
- Amazon ElastiCache: In-memory caching (Redis/Memcached).
- Specialty Types: Graph (Neptune), Time-series (Timestream), Search (OpenSearch).
Visual Anchors
Database Selection Decision Tree
Row vs. Columnar Storage (OLTP vs. OLAP)
Definition-Example Pairs
- Key-Value Store: A database that manages data as a collection of unique keys associated with values.
- Example: Using DynamoDB to store user session data or shopping carts in an e-commerce app.
- Document Store: A database that stores data in JSON-like formats, allowing for nested structures and schema flexibility.
- Example: Using DocumentDB to manage a content management system where different articles have varying metadata fields.
- Data Warehouse: A centralized repository designed for query and analysis rather than transaction processing.
- Example: Using Redshift to aggregate years of sales data from multiple regions to identify long-term buying trends.
Worked Examples
Example 1: Refactoring for Cost
Scenario: A company runs a legacy SQL Server on EC2 and pays $10k/month in licenses. They want to move to a managed service with no code changes. Solution: Use Amazon Aurora PostgreSQL with Babelfish.
- Use SCT to analyze schema compatibility.
- Deploy Aurora with Babelfish enabled.
- Redirect application connection strings to the Aurora endpoint using the T-SQL port. Result: Licensing costs eliminated; management overhead reduced.
Example 2: Rearchitecting for Scale
Scenario: A mobile game's leaderboard table is growing too fast for a traditional RDS instance to handle the write volume. Solution: Rearchitect to Amazon DynamoDB.
- Map the SQL table (User_ID, Score, Timestamp) to a DynamoDB table.
- Modify the application's Data Access Layer (DAL) to use the AWS SDK for
PutItemandQueryoperations. - Use DMS to migrate existing data. Result: Unlimited horizontal scaling and predictable performance.
Checkpoint Questions
- Which migration strategy requires a complete overhaul of the application's CRUD operations?
- What service would you use to migrate a database while automatically converting the schema from Oracle to PostgreSQL?
- How does Amazon Aurora differ from standard RDS in terms of capacity management?
- Which AWS database is specifically optimized for complex join-heavy analytical queries on massive datasets?
Muddy Points & Cross-Refs
- DMS vs. SCT: Remember that DMS (Database Migration Service) moves the data, while SCT (Schema Conversion Tool) converts the structure (tables, views, stored procedures) when moving between different database engines.
- Aurora Serverless vs. Provisioned: Use Serverless for unpredictable or intermittent workloads; use Provisioned for steady-state production workloads where you want fine-grained control over instance sizes.
- Cross-Reference: See "Identity and Access Management" for securing these databases with IAM-based authentication (available for RDS and Aurora).
Comparison Tables
| Feature | Amazon Aurora | Amazon DynamoDB | Amazon Redshift |
|---|---|---|---|
| Model | Relational (SQL) | Key-Value / Doc (NoSQL) | Columnar (OLAP) |
| Scaling | Vertical / Read Replicas | Horizontal (Partitioning) | Horizontal (Clustering) |
| Consistency | Strong Consistency | Eventual (Strong optional) | Strong (ACID for transactions) |
| Best For | High-perf Web Apps | High-scale / Low-latency | Complex BI / Analytics |
| Migration Tool | DMS / Babelfish | DMS | DMS / SCT / Data Extractors |