Study Guide862 words

Amazon DynamoDB: Master Keys and Indexing

Define Amazon DynamoDB keys and indexing

Amazon DynamoDB: Master Keys and Indexing

This guide covers the fundamental architecture of Amazon DynamoDB data modeling, focusing on how primary keys and secondary indexes enable high-performance, distributed data access for modern applications.

Learning Objectives

By the end of this guide, you will be able to:

  • Differentiate between Simple Primary Keys and Composite Primary Keys.
  • Explain how the Partition Key determines physical data storage via hashing.
  • Compare and contrast Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI).
  • Identify the constraints and limits for items and indexes in DynamoDB.
  • Design optimal keys based on specific application access patterns.

Key Terms & Glossary

  • Partition Key (PK): Also known as a "Hash Key," it is an attribute used as input to an internal hash function to determine the physical partition where data is stored.
  • Sort Key (SK): Also known as a "Range Key," it is used to group items with the same Partition Key in a sorted order.
  • Composite Primary Key: A primary key consisting of both a Partition Key and a Sort Key.
  • Projection: The set of attributes that are copied from a base table into a secondary index.
  • GSI (Global Secondary Index): An index with a partition key and a sort key that can be different from those on the base table.
  • LSI (Local Secondary Index): An index that has the same partition key as the base table but a different sort key.

The "Big Idea"

In traditional SQL databases, indexes are often added as an afterthought to speed up queries. In DynamoDB, keys and indexes are the architecture. Because DynamoDB is a distributed NoSQL database, the Partition Key is the mechanism that allows the system to scale horizontally to any amount of data. If you choose a poor partition key (low cardinality), you create "hot partitions" that bottleneck performance. Indexes (GSI/LSI) are not just "helpers"; they are separate data structures that allow you to support multiple Access Patterns without performing expensive full-table scans.

Formula / Concept Box

FeatureLocal Secondary Index (LSI)Global Secondary Index (GSI)
Partition KeyMust be the SAME as the base tableCan be DIFFERENT from the base table
Sort KeyMust be DIFFERENT from the base tableCan be DIFFERENT from the base table
Creation TimeOnly at Table CreationAnytime (Creation or Update)
ConsistencyStrong or EventualEventual Consistency only
LimitsMax 5 per tableMax 20 per table
CapacityShares throughput with base tableHas its own provisioned throughput

[!IMPORTANT] Item Size Limit: A single item in DynamoDB (including attribute names) cannot exceed 400 KB.

Hierarchical Outline

  1. Primary Keys
    • Simple Primary Key: Only a Partition Key (e.g., UserID). Unique identifier.
    • Composite Primary Key: Partition Key + Sort Key (e.g., Artist + SongTitle). Allows multiple items with the same PK.
  2. Secondary Indexes
    • LSI: Extends the sort capabilities for a single partition. Restricted to 10GB per partition.
    • GSI: Allows querying across the entire table using different attributes. Highly flexible.
  3. Data Types
    • Scalars: String, Number, Binary, Boolean, Null.
    • Documents: List, Map (JSON-like structures).
    • Sets: String Set, Number Set, Binary Set.

Visual Anchors

Data Distribution Logic

This diagram illustrates how the Partition Key maps to physical storage nodes.

\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, minimum width=2.5cm, minimum height=1cm, align=center}] \node (input) {Input PK$e.g., "User123")}; \node (hash) [right=of input, circle, fill=blue!10] {Hash\Function}; \node (p1) [right=of hash, yshift=1.5cm, fill=green!10] {Partition 1}; \node (p2) [right=of hash, fill=green!10] {Partition 2}; \node (p3) [right=of hash, yshift=-1.5cm, fill=green!10] {Partition 3};

\draw[->, thick] (input) -- (hash); \draw[->, dashed] (hash) -- (p1) node[midway, above, sloped] {00-3F}; \draw[->, thick] (hash) -- (p2) node[midway, above] {40-7F}; \draw[->, dashed] (hash) -- (p3) node[midway, below, sloped] {80-FF};

\node[below=0.5cm of p3, draw=none, font=\itshape] {Partitioning based on Hash Range}; \end{tikzpicture}

Key Hierarchy

Loading Diagram...

Definition-Example Pairs

  • Partition Key (High Cardinality)

    • Definition: A key with many unique values that distributes workload evenly across partitions.
    • Example: Using DeviceID for IoT sensors is better than using Status (which only has 'Active' or 'Inactive') to avoid hot partitions.
  • Sort Key (Range Query)

    • Definition: An attribute that allows for sorting and using comparison operators (>>, <<, between, begins_with).
    • Example: In a Transactions table, use CustomerID as PK and Timestamp as SK to find all purchases made in the last 30 days.
  • Attribute Projection

    • Definition: Choosing which specific attributes from the base table are available in an index.
    • Example: A GSI for a Users table might only project Email and PhoneNumber to save on storage and throughput costs while still supporting lookups.

Worked Examples

Scenario: The Music Database

You are designing a table for a music streaming app. Users need to find songs by an Artist, but sometimes they want to search by Genre across all artists.

1. Base Table Configuration:

  • PK: Artist (String)
  • SK: SongTitle (String)
  • Query Support: GetItem (specific song) or Query (all songs by one artist).

2. The Problem: A user wants to find all "Jazz" songs released in 2023. Searching the base table would require a Scan (expensive and slow).

3. The Solution: GSI

  • Create a GSI named GenreYearIndex.
  • GSI PK: Genre (String)
  • GSI SK: ReleaseYear (Number)
  • Projection: INCLUDE (SongTitle, Price).

4. Resulting Query:

sql
-- Conceptually via PartiQL SELECT SongTitle, Price FROM Music.GenreYearIndex WHERE Genre = 'Jazz' AND ReleaseYear = 2023

Checkpoint Questions

  1. True or False: You can add a Local Secondary Index (LSI) to an existing DynamoDB table.
  2. Which key type is used as input to the internal hash function to determine physical storage?
  3. What is the maximum size allowed for a single item in DynamoDB?
  4. If you need to query a table using a completely different Partition Key than the one defined on the table, which index should you use?
  5. Explain the difference between a Scan and a Query operation in terms of efficiency.
Click to see Answers
  1. False. LSIs must be created at the time of table creation.
  2. Partition Key (Hash Key).
  3. 400 KB.
  4. Global Secondary Index (GSI).
  5. A Query finds items based on primary key values and is highly efficient. A Scan reads every single item in the table, consuming significant read capacity and slowing down as the table grows.

Ready to study AWS Certified Developer - Associate (DVA-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free