AWS Specialized Data Stores & Access Patterns

This guide focuses on the AWS philosophy of "purpose-built" databases. Instead of forcing every workload into a traditional relational database, AWS provides specialized data stores optimized for specific access patterns, such as full-text search, graph relationships, or in-memory caching.

Learning Objectives

Identify the appropriate AWS data store based on specific application access patterns.
Explain the use cases for Amazon OpenSearch Service in search and analytics.
Differentiate between Relational (OLTP), Non-Relational (NoSQL), and Data Warehousing (OLAP) workloads.
Select the correct storage class or database to optimize for cost and performance latency.

Key Terms & Glossary

Access Pattern: The specific way an application reads or writes data (e.g., random lookups, range scans, full-text search).
Full-Text Search: A technique to search for documents or data that match a search string within a large body of text, often involving ranking and relevance.
OLTP (Online Transactional Processing): Characterized by a large number of short online transactions (e.g., RDS).
OLAP (Online Analytical Processing): Characterized by relatively low volume of transactions involving very complex queries (e.g., Redshift).
High Cardinality: Refers to data sets with many unique values, essential for efficient partitioning in systems like DynamoDB.

The "Big Idea"

Modern application architecture has shifted from the "One Size Fits All" approach (where everything lives in a monolithic SQL database) to "Purpose-Built Databases." By matching the data store to the access pattern, developers can achieve sub-millisecond latency and massive scale that traditional databases cannot provide. For example, while you could perform a text search in SQL using LIKE %query%, it is computationally expensive; Amazon OpenSearch is specifically designed to handle that exact pattern efficiently.

Formula / Concept Box

Access Pattern	Optimal AWS Service	Primary Characteristic
Relational / ACID	Amazon RDS / Aurora	Complex joins, strict schema, transactions
Key-Value / Document	Amazon DynamoDB	Single-digit ms latency at any scale
Full-Text Search	Amazon OpenSearch	Complex indexing, ranking, log analytics
In-Memory / Sub-ms	Amazon ElastiCache	Caching, session stores, real-time gaming
Graph / Relationships	Amazon Neptune	Highly connected data, fraud detection
Analytical / SQL	Amazon Redshift	Petabyte-scale data warehousing

Hierarchical Outline

I. Relational vs. Non-Relational
- Amazon RDS: Managed SQL (MySQL, PostgreSQL, etc.). Use for structured data and complex transactions.
- Amazon DynamoDB: Serverless NoSQL. Use for high-scale and predictable performance.
II. Search and Analytics
- Amazon OpenSearch Service: Derived from Elasticsearch/OpenSearch. Optimized for full-text search, log monitoring, and real-time application monitoring.
- Amazon Athena: Serverless query service to analyze data in S3 using standard SQL.
III. Performance Optimization
- Amazon ElastiCache: Supports Redis and Memcached. Used to reduce database load and improve read latency.
- MemoryDB for Redis: An in-memory, Redis-compatible, durable database service.
IV. Specialized Connections
- Amazon Neptune: Graph database for data with complex relationships (e.g., social networks).
- Amazon DocumentDB: MongoDB-compatible document store for JSON-like workloads.

Visual Anchors

Database Selection Decision Tree

Loading Diagram...

Data Structure Comparisons

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Full-Text Search: Searching for keywords across millions of documents with partial matches.
- Example: A retail website's search bar that suggests "blue running shoes" even if the user only types "blu run."
Graph Relationships: Storing data where the links between entities are as important as the entities themselves.
- Example: A recommendation engine that suggests friends based on mutual connections and shared interests.
In-Memory Caching: Storing frequently accessed data in RAM rather than on disk.
- Example: Storing the results of a heavy SQL query for the top 10 trending news articles so the database isn't hit for every page load.

Worked Examples

Scenario: Migrating a Search Feature

Problem: A company uses Amazon RDS (PostgreSQL) to store product catalogs. Users complain that the keyword search is slow and doesn't handle typos well.

Solution Step-by-Step:

Identify the bottleneck: SQL LIKE queries with wildcards (%query%) do not use standard indexes efficiently and cause full table scans.
Introduce Amazon OpenSearch: Provision an OpenSearch cluster.
Sync Data: Use a Lambda function triggered by DynamoDB Streams (or RDS Binlogs) to index product data into OpenSearch as it changes.
Update Application: Change the search API to query the OpenSearch endpoint instead of the database.
Result: Search results now return in milliseconds with support for "fuzzy matching" (handling typos).

Checkpoint Questions

Which service should you use if your application requires a NoSQL database with sub-10ms latency at massive scale?
You need to perform complex analytical queries on multi-petabyte datasets stored in S3. Which service is best suited for this (Athena or Redshift Spectrum)?
What is the primary use case for Amazon Neptune?
If your access pattern requires fast lookups of JSON documents, which service is most compatible with MongoDB drivers?

▶Click to see answers

Amazon DynamoDB.
Amazon Redshift (specifically using Spectrum for S3 data) or Athena for serverless SQL.
Highly connected data/Graph workloads (e.g., social networks, fraud detection).
Amazon DocumentDB.