AWS Specialized Data Stores & Access Patterns
Use specialized data stores based on access patterns (for example, Amazon OpenSearch Service)
AWS Specialized Data Stores & Access Patterns
This guide focuses on the AWS philosophy of "purpose-built" databases. Instead of forcing every workload into a traditional relational database, AWS provides specialized data stores optimized for specific access patterns, such as full-text search, graph relationships, or in-memory caching.
Learning Objectives
- Identify the appropriate AWS data store based on specific application access patterns.
- Explain the use cases for Amazon OpenSearch Service in search and analytics.
- Differentiate between Relational (OLTP), Non-Relational (NoSQL), and Data Warehousing (OLAP) workloads.
- Select the correct storage class or database to optimize for cost and performance latency.
Key Terms & Glossary
- Access Pattern: The specific way an application reads or writes data (e.g., random lookups, range scans, full-text search).
- Full-Text Search: A technique to search for documents or data that match a search string within a large body of text, often involving ranking and relevance.
- OLTP (Online Transactional Processing): Characterized by a large number of short online transactions (e.g., RDS).
- OLAP (Online Analytical Processing): Characterized by relatively low volume of transactions involving very complex queries (e.g., Redshift).
- High Cardinality: Refers to data sets with many unique values, essential for efficient partitioning in systems like DynamoDB.
The "Big Idea"
Modern application architecture has shifted from the "One Size Fits All" approach (where everything lives in a monolithic SQL database) to "Purpose-Built Databases." By matching the data store to the access pattern, developers can achieve sub-millisecond latency and massive scale that traditional databases cannot provide. For example, while you could perform a text search in SQL using LIKE %query%, it is computationally expensive; Amazon OpenSearch is specifically designed to handle that exact pattern efficiently.
Formula / Concept Box
| Access Pattern | Optimal AWS Service | Primary Characteristic |
|---|---|---|
| Relational / ACID | Amazon RDS / Aurora | Complex joins, strict schema, transactions |
| Key-Value / Document | Amazon DynamoDB | Single-digit ms latency at any scale |
| Full-Text Search | Amazon OpenSearch | Complex indexing, ranking, log analytics |
| In-Memory / Sub-ms | Amazon ElastiCache | Caching, session stores, real-time gaming |
| Graph / Relationships | Amazon Neptune | Highly connected data, fraud detection |
| Analytical / SQL | Amazon Redshift | Petabyte-scale data warehousing |
Hierarchical Outline
- I. Relational vs. Non-Relational
- Amazon RDS: Managed SQL (MySQL, PostgreSQL, etc.). Use for structured data and complex transactions.
- Amazon DynamoDB: Serverless NoSQL. Use for high-scale and predictable performance.
- II. Search and Analytics
- Amazon OpenSearch Service: Derived from Elasticsearch/OpenSearch. Optimized for full-text search, log monitoring, and real-time application monitoring.
- Amazon Athena: Serverless query service to analyze data in S3 using standard SQL.
- III. Performance Optimization
- Amazon ElastiCache: Supports Redis and Memcached. Used to reduce database load and improve read latency.
- MemoryDB for Redis: An in-memory, Redis-compatible, durable database service.
- IV. Specialized Connections
- Amazon Neptune: Graph database for data with complex relationships (e.g., social networks).
- Amazon DocumentDB: MongoDB-compatible document store for JSON-like workloads.
Visual Anchors
Database Selection Decision Tree
Data Structure Comparisons
\begin{tikzpicture}[node distance=2cm] % Key Value \draw[thick] (0,0) rectangle (2,1.5); \node at (1,1.7) {\textbf{Key-Value}}; \node at (1,1) {id: 101}; \node at (1,0.5) {val: "Apple"};
% Relational
\draw[thick] (4,0) grid (6,1.5);
\node at (5,1.7) {\textbf{Relational}};
\node at (4.5,1.25) {ID};
\node at (5.5,1.25) {Name};
% Graph
\node at (9,1.7) {\textbf{Graph}};
\draw[fill=blue!20] (8.5,0.5) circle (0.3cm) node {A};
\draw[fill=blue!20] (9.5,0.5) circle (0.3cm) node {B};
\draw[->, thick] (8.8,0.5) -- (9.2,0.5);\end{tikzpicture}
Definition-Example Pairs
- Full-Text Search: Searching for keywords across millions of documents with partial matches.
- Example: A retail website's search bar that suggests "blue running shoes" even if the user only types "blu run."
- Graph Relationships: Storing data where the links between entities are as important as the entities themselves.
- Example: A recommendation engine that suggests friends based on mutual connections and shared interests.
- In-Memory Caching: Storing frequently accessed data in RAM rather than on disk.
- Example: Storing the results of a heavy SQL query for the top 10 trending news articles so the database isn't hit for every page load.
Worked Examples
Scenario: Migrating a Search Feature
Problem: A company uses Amazon RDS (PostgreSQL) to store product catalogs. Users complain that the keyword search is slow and doesn't handle typos well.
Solution Step-by-Step:
- Identify the bottleneck: SQL
LIKEqueries with wildcards (%query%) do not use standard indexes efficiently and cause full table scans. - Introduce Amazon OpenSearch: Provision an OpenSearch cluster.
- Sync Data: Use a Lambda function triggered by DynamoDB Streams (or RDS Binlogs) to index product data into OpenSearch as it changes.
- Update Application: Change the search API to query the OpenSearch endpoint instead of the database.
- Result: Search results now return in milliseconds with support for "fuzzy matching" (handling typos).
Checkpoint Questions
- Which service should you use if your application requires a NoSQL database with sub-10ms latency at massive scale?
- You need to perform complex analytical queries on multi-petabyte datasets stored in S3. Which service is best suited for this (Athena or Redshift Spectrum)?
- What is the primary use case for Amazon Neptune?
- If your access pattern requires fast lookups of JSON documents, which service is most compatible with MongoDB drivers?
▶Click to see answers
- Amazon DynamoDB.
- Amazon Redshift (specifically using Spectrum for S3 data) or Athena for serverless SQL.
- Highly connected data/Graph workloads (e.g., social networks, fraud detection).
- Amazon DocumentDB.