Mastering AWS Database Architectures: SAP-C02 Study Guide
Databases (for example, Amazon DynamoDB, Amazon OpenSearch Service, Amazon RDS, self-managed databases on Amazon EC2)
Mastering AWS Database Architectures: SAP-C02 Study Guide
This guide covers the critical database selection, design, and migration skills required for the AWS Certified Solutions Architect - Professional (SAP-C02) exam. We focus on choosing the right tool for the right job, ensuring high availability, and optimizing performance.
Learning Objectives
- Evaluate business requirements to select the appropriate AWS database service (RDS, DynamoDB, Aurora, etc.).
- Design high-availability and disaster recovery architectures for relational and non-relational workloads.
- Implement caching and performance optimization strategies using ElastiCache and DAX.
- Differentiate between managed services and self-managed databases on Amazon EC2.
- Strategize database migrations using AWS DMS and SCT.
Key Terms & Glossary
- ACID Compliance: Atomicity, Consistency, Isolation, Durability. Standard for relational databases (RDS/Aurora) to ensure reliable transactions.
- Multi-AZ Deployment: A high-availability feature that provides synchronous data replication to a standby instance in a different Availability Zone.
- Read Replica: Asynchronous replication used to scale read-heavy workloads. Available in RDS and Aurora.
- Partition Key: The primary attribute in DynamoDB used to distribute data across physical storage partitions.
- GSI (Global Secondary Index): An index with a partition key and a sort key that can be different from those on the base DynamoDB table.
- DAX (DynamoDB Accelerator): An in-memory cache for DynamoDB that reduces response times from milliseconds to microseconds.
The "Big Idea"
The core philosophy of AWS databases is "Purpose-Built." Instead of forcing every workload into a traditional relational database, architects must decompose applications into components that use the storage engine best suited for their access patterns (e.g., Key-Value for sessions, Relational for ERP, Graph for social links).
Formula / Concept Box
| Feature | Amazon RDS / Aurora | Amazon DynamoDB | Self-Managed (EC2) |
|---|---|---|---|
| Scaling | Vertical (Instance size) + Read Replicas | Horizontal (Partitioning) | Manual / Vertical |
| HA/DR | Multi-AZ (Automatic Failover) | Global Tables (Active-Active) | Manual Configuration |
| Schema | Fixed / Structured | Flexible / NoSQL | Full Control |
| Management | Fully Managed (Patching/Backups) | Serverless | Customer Managed |
Hierarchical Outline
- Relational Databases (SQL)
- Amazon RDS: Managed service for MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server.
- Amazon Aurora: Cloud-native relational database; 5x throughput of MySQL; 15 read replicas; storage auto-healing.
- Non-Relational Databases (NoSQL)
- Amazon DynamoDB: Key-value and document store; single-digit millisecond latency at any scale; serverless.
- Amazon DocumentDB: MongoDB-compatible JSON store; decoupled compute and storage.
- Specialized Data Stores
- Amazon Neptune: Graph database for highly connected datasets (social, fraud detection).
- Amazon OpenSearch: Search and log analytics; replacement for Elasticsearch.
- Amazon ElastiCache: In-memory caching (Redis/Memcached) for sub-millisecond data retrieval.
- Self-Managed Databases
- Amazon EC2: Required when a DB engine is not supported by RDS, or when OS-level access/specific plugins are mandatory.
Visual Anchors
Database Selection Flowchart
RDS Multi-AZ vs. Read Replica
Definition-Example Pairs
- Amazon Neptune: A graph database designed for complex relationships. Example: A social media platform tracking "friends of friends" or a recommendation engine suggesting products based on shared interests.
- Amazon OpenSearch Service: A search and analytics engine. Example: A retail website allowing customers to filter and search through millions of product descriptions and logs in real-time.
- Amazon Timestream: A time-series database. Example: An IoT sensor grid recording temperature readings every second for historical trend analysis.
Worked Examples
Example 1: Global E-Commerce Shopping Cart
Scenario: A retailer needs a globally distributed shopping cart system that handles millions of requests during Black Friday. Solution: Use Amazon DynamoDB with Global Tables.
- Enable Global Tables to replicate data across regions with sub-second latency.
- Use On-Demand Capacity Mode to handle the unpredictable spike in traffic without manual scaling.
- Implement DAX to ensure that even during peak load, repeated item reads remain at microsecond speeds.
Example 2: Migrating a Legacy SQL Server to AWS
Scenario: An enterprise wants to migrate a SQL Server database but requires OS-level access for a custom third-party auditing plugin. Solution: Deploy SQL Server on Amazon EC2.
- RDS is usually preferred, but the "OS-level access" requirement forces a self-managed approach.
- Use Amazon EBS Provisioned IOPS (io2) for high-performance storage.
- Configure Always On Availability Groups for high availability manually.
Checkpoint Questions
- Which database service is best for storing and querying complex JSON documents with MongoDB compatibility?
- In Amazon RDS, does a Read Replica provide automatic failover for high availability?
- When should an architect choose DynamoDB On-Demand capacity over Provisioned capacity?
- What is the main difference between synchronous and asynchronous replication in the context of RDS Multi-AZ?
Muddy Points & Cross-Refs
- Aurora Serverless vs. DynamoDB: Both are serverless, but Aurora is for relational (SQL) data while DynamoDB is for NoSQL. Choose Aurora if you have complex joins or a fixed schema; choose DynamoDB for infinite horizontal scale.
- DMS vs. SCT: The Schema Conversion Tool (SCT) is used before migration to convert a source schema (e.g., Oracle) to a target schema (e.g., PostgreSQL). Database Migration Service (DMS) is the engine that actually moves the data.
- ElastiCache vs. DAX: Use ElastiCache for RDS or general application caching. Use DAX exclusively for DynamoDB.
Comparison Tables
RDS vs. Aurora
| Feature | Amazon RDS | Amazon Aurora |
|---|---|---|
| Storage | Fixed size (must scale manually/auto) | Auto-scaling (up to 128 TiB) |
| Replication | Up to 5-15 Replicas (engine dependent) | Up to 15 Replicas (low lag) |
| Self-Healing | No (requires snapshots/backups) | Yes (storage is distributed and self-healing) |
| Performance | Standard engine performance | 3x to 5x throughput improvement |