AWS Database Services: Relational, NoSQL, and In-Memory Solutions
Database types and services (for example, serverless, relational compared with non-relational, in-memory)
AWS Database Services: Relational, NoSQL, and In-Memory Solutions
This study guide covers the core database paradigms and AWS-managed services required for the AWS Certified Solutions Architect - Associate (SAA-C03) exam. It focuses on selecting the right database based on access patterns, scalability needs, and data structure.
Learning Objectives
- Distinguish between relational (SQL) and non-relational (NoSQL) database architectures.
- Identify appropriate use cases for Amazon RDS, Aurora, DynamoDB, and ElastiCache.
- Understand the performance benefits of in-memory caching and columnar storage.
- Evaluate serverless vs. provisioned database capacity models.
- Differentiate between High Availability (Multi-AZ) and Scalability (Read Replicas).
Key Terms & Glossary
- ACID Compliance: Atomicity, Consistency, Isolation, Durability. Standard properties for relational databases ensuring reliable transactions.
- Schema: The fixed structure of a relational database table (predefined columns and data types).
- OLTP (Online Transactional Processing): Optimized for frequent, small transactions (e.g., RDS).
- OLAP (Online Analytical Processing): Optimized for complex queries and data analysis (e.g., Redshift).
- Key-Value Store: A NoSQL type where data is stored as a collection of unique keys and associated values (e.g., DynamoDB).
- Latency: The time it takes for a data request to be completed. In-memory databases provide sub-millisecond latency.
The "Big Idea"
In the AWS ecosystem, the goal is to "Use the Right Tool for the Job." No single database fits every scenario. Relational databases excel at complex joins and strict data integrity, while NoSQL databases provide virtually infinite horizontal scaling for unstructured data. Modern cloud architectures often use a "polyglot persistence" approach, where different services (e.g., RDS for orders, DynamoDB for session state, ElastiCache for speed) work together.
Formula / Concept Box
| Feature | Relational (RDS/Aurora) | Non-Relational (DynamoDB) | In-Memory (ElastiCache) |
|---|---|---|---|
| Data Structure | Structured (Rows/Columns) | Semi-structured (JSON/Key-Value) | Key-Value / Simple structures |
| Scaling | Vertical (Larger Instance) | Horizontal (More Partitions) | Vertical & Horizontal |
| Primary Use | Complex Joins, Transactions | High-speed, High-scale apps | Caching, Session store |
| Consistency | Strong Consistency | Eventual (default) or Strong | N/A (Transient data) |
Hierarchical Outline
- Relational Databases (RDS)
- Managed Service: AWS handles patching, backups, and setup.
- Engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server.
- Amazon Aurora: Cloud-native, high-performance relational engine; up to 15 Read Replicas.
- High Availability: Multi-AZ for synchronous replication to a standby instance.
- Scaling: Read Replicas for asynchronous replication to offload read traffic.
- Non-Relational Databases (NoSQL)
- Amazon DynamoDB: Fully managed, serverless, key-value and document database.
- Performance: Consistent single-digit millisecond latency at any scale.
- Scaling: Uses Partition Keys and Sort Keys to distribute data.
- In-Memory Databases
- Amazon ElastiCache: Managed Redis or Memcached.
- Purpose: Offload database pressure and reduce latency for frequently read data.
- Specialized Databases
- Amazon Redshift: Columnar storage for data warehousing/analytics.
- Amazon Neptune: Graph database for highly connected data.
Visual Anchors
Database Selection Decision Tree
Data Structure: Relational vs. NoSQL
\begin{tikzpicture}[node distance=2cm] % Relational Box \draw (0,0) rectangle (3,2); \node at (1.5,2.3) {\textbf{Relational (Fixed)}}; \draw (0,1.5) -- (3,1.5); \draw (1,0) -- (1,1.5); \draw (2,0) -- (2,1.5); \node at (0.5,1.7) {ID}; \node at (1.5,1.7) {Name}; \node at (2.5,1.7) {Age}; \node at (0.5,1.2) {1}; \node at (1.5,1.2) {Alice}; \node at (2.5,1.2) {30};
% NoSQL Box \draw (5,0) rectangle (8,2); \node at (6.5,2.3) {\textbf{NoSQL (Flexible)}}; \node[text width=2.5cm, font=\small] at (6.5,1) { { \ \space\space "ID": 1, \ \space\space "Name": "Alice", \ \space\space "Hobby": "Golf" \ } }; \end{tikzpicture}
Definition-Example Pairs
- Read Replica:
- Definition: A read-only copy of the primary database used to reduce the load on the master instance.
- Example: A news website where millions of users read the same articles (read-heavy), while only a few editors write content (write-light).
- Multi-AZ Deployment:
- Definition: A high-availability feature that creates a synchronous standby instance in a different Availability Zone.
- Example: A banking application that must stay online even if an entire data center loses power.
- Partition Key:
- Definition: A value used by DynamoDB to determine which physical partition the data is stored on.
- Example: In a user table, the
user_idacts as the partition key to ensure user data is evenly spread across the database cluster.
Worked Examples
Example 1: Selecting a Database for E-commerce
Scenario: A company is building a shopping cart for a global e-commerce site. The data needs to scale horizontally to handle millions of concurrent users during Black Friday, and the schema may change as new product types are added.
- Solution: Amazon DynamoDB.
- Reasoning: It is serverless (no management overhead), scales horizontally automatically, and has a flexible schema (NoSQL) to handle various product attributes.
Example 2: Migrating a Legacy CRM
Scenario: A business wants to migrate its on-premises PostgreSQL database to AWS. They require high availability and the ability to scale reads globally for their international offices.
- Solution: Amazon Aurora with Global Database.
- Reasoning: Aurora is compatible with PostgreSQL, provides high performance, and allows for global read replicas to keep latency low for international users.
Checkpoint Questions
- Which database engine should you choose for a data warehousing workload that requires complex analytical queries on petabytes of data?
- What is the main difference between RDS Multi-AZ and RDS Read Replicas in terms of data replication (Synchronous vs. Asynchronous)?
- If an application requires sub-millisecond response times for a session state store, which service is most appropriate?
- True or False: You must define all column names and data types before inserting data into a DynamoDB table.
[!TIP] Remember: RDS is for SQL, DynamoDB is for NoSQL/Serverless, and ElastiCache is for Speed (Caching).
[!IMPORTANT] For the exam, know that Amazon Aurora is AWS's proprietary database that is significantly faster than standard MySQL/PostgreSQL and features a self-healing storage system distributed across 3 AZs by default.