Study Guide1,145 words

DVA-C02 Study Guide: Use Data Stores in Application Development

Use data stores in application development

Use Data Stores in Application Development

This guide covers Domain 1, Task 3 of the AWS Certified Developer Associate (DVA-C02) exam. It focuses on selecting, configuring, and optimizing AWS storage and database services for modern application architectures.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between Relational (RDS/Aurora) and NoSQL (DynamoDB) use cases.
  • Optimize DynamoDB performance using high-cardinality partition keys and appropriate indexing (GSI/LSI).
  • Select the correct S3 storage class based on access patterns and cost-efficiency.
  • Implement caching strategies using ElastiCache to reduce database load.
  • Describe database consistency models and their impact on application behavior.

Key Terms & Glossary

  • ACID Compliance: Atomic, Consistent, Isolated, Durable. Standard for relational databases (RDS) to ensure reliability.
  • BASE Consistency: Basically Available, Soft state, Eventual consistency. The philosophy behind NoSQL stores like DynamoDB.
  • Partition Key (PK): A simple primary key used by DynamoDB to distribute data across physical shards.
  • Sort Key (SK): Used in conjunction with a PK to create a composite primary key, allowing for range queries.
  • GSI (Global Secondary Index): An index with a PK and SK that can be different from those on the base table; can be created/deleted anytime.
  • LSI (Local Secondary Index): An index that has the same PK as the table but a different SK; must be created at table creation time.
  • TTL (Time to Live): A mechanism to automatically delete items from a table after a certain timestamp to manage data lifecycles.

The "Big Idea"

In cloud-native development, persistence is decoupled. Instead of a single "God Database," developers must use a "Polyglot Persistence" approach. This means using Amazon S3 for massive, unstructured files, Amazon DynamoDB for high-scale session state or metadata, and Amazon RDS for complex, transactional relationships. Mastering the Developer Associate exam requires knowing how to minimize latency and cost by choosing the specific tool that matches the data's access pattern.

Formula / Concept Box

DynamoDB Throughput Calculations

OperationUnit DefinitionFormula
Write Capacity Unit (WCU)1 KB per secondItems per sec×Size (rounded up to nearest KB)\text{Items per sec} \times \text{Size (rounded up to nearest KB)}
Read Capacity Unit (RCU)4 KB per second (Strong)Items per sec×Size (rounded up to 4KB)/4\text{Items per sec} \times \text{Size (rounded up to 4KB)} / 4
RCU (Eventual)2 Reads per unitStrong RCU/2\text{Strong RCU} / 2

[!IMPORTANT] Always round up the item size to the nearest 1 KB (for writes) or 4 KB (for reads) before multiplying by the number of items per second.

Hierarchical Outline

  1. Relational Data Stores (SQL)
    • Amazon RDS: Managed SQL (MySQL, PostgreSQL, Oracle). Best for complex joins and multi-row transactions.
    • Amazon Aurora: Cloud-native relational DB. 5x faster than standard MySQL; features auto-scaling storage and serverless options.
  2. NoSQL Data Stores (Key-Value/Document)
    • Amazon DynamoDB: Serverless, single-digit millisecond latency at any scale.
      • Indexing: GSIs vs. LSIs.
      • Consistency: Eventual (default) vs. Strong (optional, double cost).
  3. Object Storage
    • Amazon S3: High-durability (99.999999999%) storage for images, videos, and logs.
      • Storage Classes: Standard, IA (Infrequent Access), Glacier (Archival).
  4. Specialized Stores & Caching
    • Amazon ElastiCache: In-memory caching (Redis/Memcached) for sub-millisecond data retrieval.
    • Amazon OpenSearch: For full-text search and log analytics.
    • Amazon Neptune: Graph database for highly connected social or fraud data.

Visual Anchors

Database Selection Flowchart

Loading Diagram...

DynamoDB Item Structure

\begin{tikzpicture}[node distance=1cm, every node/.style={draw, fill=blue!10, rounded corners}] \node (item) [minimum width=6cm, minimum height=3cm] {\textbf{DynamoDB Item}}; \node (pk) [below left=0.5cm and -2.5cm of item.north, fill=orange!20] {\textbf{Partition Key (Hash)}}; \node (sk) [below right=0.5cm and -2.5cm of item.north, fill=green!20] {\textbf{Sort Key (Range)}}; \node (attr) [below=1.5cm of item.north, fill=white] {Attributes (JSON-like data)};

code
\draw[->, thick] (pk) -- (attr); \draw[->, thick] (sk) -- (attr); \node[draw=none, fill=none, right=3.5cm of item] (desc) {\begin{tabular}{l} \textbf{Best Practice:}\\ High cardinality PKs\\ distribute load evenly. \end{tabular}};

\end{tikzpicture}

Definition-Example Pairs

  • Hot Partition: A situation where a single partition receives a disproportionate amount of traffic, leading to throttling.
    • Example: A social media app using "Date" as a Partition Key. On a specific holiday, all traffic hits one partition (the current date), while others sit idle.
  • S3 Lifecycle Policy: Rules that automatically move objects to cheaper storage classes or delete them.
    • Example: A company stores server logs in S3 Standard for 30 days, moves them to S3 Glacier for 7 years to meet compliance, then automatically deletes them.
  • Write-Through Cache: A caching strategy where data is written to the cache and the database simultaneously.
    • Example: A profile update service that updates the Redis cache and the RDS database at the same time to ensure the cache never has stale data.

Worked Examples

Scenario: Optimizing DynamoDB Costs

Problem: A developer is building a high-traffic leaderboard. They need to read 100 items per second. Each item is 10 KB. They want to minimize costs while maintaining data accuracy.

Step-by-Step Calculation:

  1. Analyze Item Size: 10 KB items. Since RCU units are 4 KB, we round up: $10 KB \rightarrow 12 KB($3×4 KB ($3 \times 4 \text{ KB} units per item).
  2. Calculate Strong Consistency: $100 items/sec \times 3 units/item = 300 RCUs$.
  3. Calculate Eventual Consistency: $300 / 2 = 150 \text{ RCUs}$.
  4. Decision: If the leaderboard can tolerate a 1-second delay in updates, choosing Eventually Consistent Reads saves 50% on throughput costs.

Checkpoint Questions

  1. What is the main difference between a Global Secondary Index (GSI) and a Local Secondary Index (LSI) regarding the Partition Key?
  2. Which S3 storage class is best for data that is recreated easily (like thumbnails) and accessed infrequently?
  3. Why should you avoid using a "Status" field (e.g., Active/Inactive) as a Partition Key in DynamoDB?
  4. When would you choose Amazon Aurora over standard Amazon RDS MySQL?
  5. What mechanism does DynamoDB use to automatically expire old session data?
Click for Answers
  1. An LSI must use the same Partition Key as the base table. A GSI can have a completely different Partition Key.
  2. S3 One Zone-IA (Infrequent Access). It's cheaper because it doesn't replicate across multiple AZs.
  3. It leads to a Hot Partition. With only two possible values, all traffic would be bottlenecked into two physical shards, regardless of how much you scale throughput.
  4. When you need high availability (Aurora replicates 6 ways across 3 AZs) or performance (it is significantly faster for read/write heavy workloads).
  5. TTL (Time to Live). You specify an attribute with a timestamp, and AWS deletes the item once that time passes.

Ready to study AWS Certified Developer - Associate (DVA-C02)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free