Study Guide820 words

High-Cardinality Partition Keys and Balanced Partition Access in DynamoDB

Describe high-cardinality partition keys for balanced partition access

High-Cardinality Partition Keys and Balanced Partition Access

This guide explores the architectural necessity of high-cardinality partition keys in Amazon DynamoDB to ensure scalable performance and prevent "hot partitions."

Learning Objectives

After studying this guide, you should be able to:

  • Explain how DynamoDB uses hash functions to distribute data across physical partitions.
  • Define cardinality and identify high-cardinality versus low-cardinality attributes.
  • Diagnose the causes of "hot partitions" and ProvisionedThroughputExceededException errors.
  • Design partition keys that achieve a uniform workload distribution for both read and write operations.

Key Terms & Glossary

  • Partition Key (Hash Key): An attribute used as input to an internal hash function that determines the physical storage location of an item.
  • Cardinality: The measure of the number of unique values in a particular column or attribute of a database. High cardinality means many unique values; low cardinality means few.
  • Hot Partition: A condition where a single partition receives a disproportionately high volume of traffic, leading to throttling even if total table throughput is within limits.
  • Adaptive Capacity: A DynamoDB feature that automatically increases throughput for imbalanced workloads, though it is not a substitute for good schema design.
  • Composite Key: A primary key composed of a partition key and a sort key.

The "Big Idea"

DynamoDB is designed for infinite horizontal scalability. This is achieved by splitting data across multiple physical storage units called partitions. To make this work, the database must spread requests evenly across these partitions. If your partition key has many unique values (high cardinality), the traffic is "shredded" across the entire fleet of servers. If it has few values (low cardinality), traffic piles up on a single server, creating a bottleneck that limits the performance of your entire application.

Formula / Concept Box

ConceptDescriptionImpact on Performance
High CardinalityLarge number of distinct values (e.g., user_id, order_id).Excellent: Uniform distribution and high scalability.
Low CardinalitySmall number of distinct values (e.g., status, gender).Poor: Causes "hot keys" and throughput throttling.
The Hash RuleHash(PartitionKey)PhysicalLocationHash(PartitionKey) \rightarrow PhysicalLocationDetermines which server handles the request.

Hierarchical Outline

  • I. DynamoDB Partitioning Mechanics
    • Data Distribution: Data is stored in 10GB partitions.
    • Hashing: The partition key is hashed to pick a partition.
  • II. Cardinality and Workload Balance
    • High-Cardinality Keys: Examples include GUIDs, timestamps, or unique IDs.
    • Balanced Access: Evenly distributed RCU (Read Capacity Units) and WCU (Write Capacity Units).
  • III. Anti-Patterns (Low Cardinality)
    • Hot Keys: Too much traffic on one key (e.g., a "Daily Deal" item).
    • Throttling: Receiving ProvisionedThroughputExceededException despite having unused total capacity.
  • IV. Remediation Strategies
    • Write Sharding: Adding a random suffix to a partition key to split a hot key into multiple keys.
    • DAX (DynamoDB Accelerator): Using an in-memory cache for hot read items.

Visual Anchors

Data Distribution Flow

Loading Diagram...

Cardinality Comparison

\begin{tikzpicture} [node distance=1cm] \draw[thick,->] (0,0) -- (8,0) node[anchor=north] {Cardinality}; \draw[fill=red!20] (1,0.5) rectangle (2,2) node[midway, align=center] {Low$Hot)}; \draw[fill=green!20] (6,0.5) rectangle (7,2) node[midway, align=center] {High$Balanced)}; \node at (1.5, -0.5) {Status (Active/Inactive)}; \node at (6.5, -0.5) {DeviceID (UUID)}; \end{tikzpicture}

Definition-Example Pairs

  • Cardinality: The uniqueness of data values in a column.
    • Example: In a table of 1 million users, country has low cardinality (~200 values), while email_address has high cardinality (1 million values).
  • Hot Key: A partition key that is accessed much more frequently than others.
    • Example: On a social media site, a celebrity's user_id might be requested 10,000 times per second, while a normal user is requested once. The celebrity's ID is a "hot key."

Worked Examples

Example 1: The Logistics App

Scenario: You are designing a table to track package deliveries.

  • Option A: Use delivery_status (Ordered, Shipped, Delivered) as the partition key.
  • Option B: Use tracking_number (Unique alphanumeric) as the partition key.

Analysis:

  • Option A is a low-cardinality key. If you have 10 million packages, 90% might be in the "Delivered" state. All queries for delivered packages would hit the same physical partition, causing a massive bottleneck.
  • Option B is a high-cardinality key. Every package has a unique tracking number. Requests are distributed perfectly across all available partitions.

Result: Use Option B for balanced access.

Checkpoint Questions

  1. What is the primary benefit of high-cardinality partition keys in DynamoDB?
    • Answer: They ensure a uniform distribution of data and traffic across physical partitions, preventing performance bottlenecks.
  2. True or False: If your total table RCU is 10,000, you can use 10,000 RCU on a single partition key.
    • Answer: False. Individual partitions have hard throughput limits (typically 3,000 RCU / 1,000 WCU).
  3. Which error message indicates that you have exceeded your partition's throughput?
    • Answer: ProvisionedThroughputExceededException.
  4. If you MUST use a low-cardinality key (like a Date), how can you improve its distribution?
    • Answer: Use a composite key (Date + UniqueID) or implement write sharding (adding a random suffix to the key).

Ready to study AWS Certified Developer - Associate (DVA-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free