Amazon DynamoDB: Master Keys and Indexing
Define Amazon DynamoDB keys and indexing
Amazon DynamoDB: Master Keys and Indexing
This guide covers the fundamental architecture of Amazon DynamoDB data modeling, focusing on how primary keys and secondary indexes enable high-performance, distributed data access for modern applications.
Learning Objectives
By the end of this guide, you will be able to:
- Differentiate between Simple Primary Keys and Composite Primary Keys.
- Explain how the Partition Key determines physical data storage via hashing.
- Compare and contrast Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI).
- Identify the constraints and limits for items and indexes in DynamoDB.
- Design optimal keys based on specific application access patterns.
Key Terms & Glossary
- Partition Key (PK): Also known as a "Hash Key," it is an attribute used as input to an internal hash function to determine the physical partition where data is stored.
- Sort Key (SK): Also known as a "Range Key," it is used to group items with the same Partition Key in a sorted order.
- Composite Primary Key: A primary key consisting of both a Partition Key and a Sort Key.
- Projection: The set of attributes that are copied from a base table into a secondary index.
- GSI (Global Secondary Index): An index with a partition key and a sort key that can be different from those on the base table.
- LSI (Local Secondary Index): An index that has the same partition key as the base table but a different sort key.
The "Big Idea"
In traditional SQL databases, indexes are often added as an afterthought to speed up queries. In DynamoDB, keys and indexes are the architecture. Because DynamoDB is a distributed NoSQL database, the Partition Key is the mechanism that allows the system to scale horizontally to any amount of data. If you choose a poor partition key (low cardinality), you create "hot partitions" that bottleneck performance. Indexes (GSI/LSI) are not just "helpers"; they are separate data structures that allow you to support multiple Access Patterns without performing expensive full-table scans.
Formula / Concept Box
| Feature | Local Secondary Index (LSI) | Global Secondary Index (GSI) |
|---|---|---|
| Partition Key | Must be the SAME as the base table | Can be DIFFERENT from the base table |
| Sort Key | Must be DIFFERENT from the base table | Can be DIFFERENT from the base table |
| Creation Time | Only at Table Creation | Anytime (Creation or Update) |
| Consistency | Strong or Eventual | Eventual Consistency only |
| Limits | Max 5 per table | Max 20 per table |
| Capacity | Shares throughput with base table | Has its own provisioned throughput |
[!IMPORTANT] Item Size Limit: A single item in DynamoDB (including attribute names) cannot exceed 400 KB.
Hierarchical Outline
- Primary Keys
- Simple Primary Key: Only a Partition Key (e.g.,
UserID). Unique identifier. - Composite Primary Key: Partition Key + Sort Key (e.g.,
Artist+SongTitle). Allows multiple items with the same PK.
- Simple Primary Key: Only a Partition Key (e.g.,
- Secondary Indexes
- LSI: Extends the sort capabilities for a single partition. Restricted to 10GB per partition.
- GSI: Allows querying across the entire table using different attributes. Highly flexible.
- Data Types
- Scalars: String, Number, Binary, Boolean, Null.
- Documents: List, Map (JSON-like structures).
- Sets: String Set, Number Set, Binary Set.
Visual Anchors
Data Distribution Logic
This diagram illustrates how the Partition Key maps to physical storage nodes.
\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, minimum width=2.5cm, minimum height=1cm, align=center}] \node (input) {Input PK$e.g., "User123")}; \node (hash) [right=of input, circle, fill=blue!10] {Hash\Function}; \node (p1) [right=of hash, yshift=1.5cm, fill=green!10] {Partition 1}; \node (p2) [right=of hash, fill=green!10] {Partition 2}; \node (p3) [right=of hash, yshift=-1.5cm, fill=green!10] {Partition 3};
\draw[->, thick] (input) -- (hash); \draw[->, dashed] (hash) -- (p1) node[midway, above, sloped] {00-3F}; \draw[->, thick] (hash) -- (p2) node[midway, above] {40-7F}; \draw[->, dashed] (hash) -- (p3) node[midway, below, sloped] {80-FF};
\node[below=0.5cm of p3, draw=none, font=\itshape] {Partitioning based on Hash Range}; \end{tikzpicture}
Key Hierarchy
Definition-Example Pairs
-
Partition Key (High Cardinality)
- Definition: A key with many unique values that distributes workload evenly across partitions.
- Example: Using
DeviceIDfor IoT sensors is better than usingStatus(which only has 'Active' or 'Inactive') to avoid hot partitions.
-
Sort Key (Range Query)
- Definition: An attribute that allows for sorting and using comparison operators (, ,
between,begins_with). - Example: In a
Transactionstable, useCustomerIDas PK andTimestampas SK to find all purchases made in the last 30 days.
- Definition: An attribute that allows for sorting and using comparison operators (, ,
-
Attribute Projection
- Definition: Choosing which specific attributes from the base table are available in an index.
- Example: A GSI for a
Userstable might only projectEmailandPhoneNumberto save on storage and throughput costs while still supporting lookups.
Worked Examples
Scenario: The Music Database
You are designing a table for a music streaming app. Users need to find songs by an Artist, but sometimes they want to search by Genre across all artists.
1. Base Table Configuration:
- PK:
Artist(String) - SK:
SongTitle(String) - Query Support:
GetItem(specific song) orQuery(all songs by one artist).
2. The Problem: A user wants to find all "Jazz" songs released in 2023. Searching the base table would require a Scan (expensive and slow).
3. The Solution: GSI
- Create a GSI named
GenreYearIndex. - GSI PK:
Genre(String) - GSI SK:
ReleaseYear(Number) - Projection:
INCLUDE(SongTitle,Price).
4. Resulting Query:
-- Conceptually via PartiQL
SELECT SongTitle, Price
FROM Music.GenreYearIndex
WHERE Genre = 'Jazz' AND ReleaseYear = 2023Checkpoint Questions
- True or False: You can add a Local Secondary Index (LSI) to an existing DynamoDB table.
- Which key type is used as input to the internal hash function to determine physical storage?
- What is the maximum size allowed for a single item in DynamoDB?
- If you need to query a table using a completely different Partition Key than the one defined on the table, which index should you use?
- Explain the difference between a Scan and a Query operation in terms of efficiency.
▶Click to see Answers
- False. LSIs must be created at the time of table creation.
- Partition Key (Hash Key).
- 400 KB.
- Global Secondary Index (GSI).
- A Query finds items based on primary key values and is highly efficient. A Scan reads every single item in the table, consuming significant read capacity and slowing down as the table grows.