Study Guide: Managing and Maintaining AWS Data Stores

This guide covers the essential strategies for selecting, configuring, and optimizing data storage solutions on AWS, specifically tailored for the AWS Certified Developer - Associate (DVA-C02) exam. We focus on Amazon DynamoDB, S3, and RDS to ensure high performance, security, and cost-efficiency.

Learning Objectives

By the end of this study guide, you should be able to:

Differentiate between various AWS data stores (RDS, DynamoDB, S3) based on use cases.
Optimize DynamoDB performance using high-cardinality partition keys and appropriate indexing (GSI/LSI).
Implement data consistency models (Strong vs. Eventual) for specific application requirements.
Manage data lifecycles using S3 Lifecycle policies and DynamoDB Time to Live (TTL).
Evaluate Query vs. Scan operations in DynamoDB for efficiency and cost.

Key Terms & Glossary

High-Cardinality: A property of a data field with many unique values (e.g., UserID). High cardinality in partition keys ensures an even distribution of data across shards.
Eventual Consistency: A consistency model where a read might not reflect the results of a recently completed write, but all replicas will eventually converge (usually within a second).
Strong Consistency: A read that returns a response reflecting all prior successful writes.
Partition Key (PK): The primary key attribute used as input to an internal hash function to determine the physical storage partition in DynamoDB.
Global Secondary Index (GSI): An index with a partition key and a sort key that can be different from those on the base table.
Object Storage: A data storage architecture that manages data as objects (used by Amazon S3), as opposed to file systems or block storage.

The "Big Idea"

The central challenge of data management in AWS is matching the storage technology to the access pattern. Developers must choose between the strict schemas of Relational Databases (RDS) for complex joins, the massive scale and low latency of NoSQL (DynamoDB) for simple key-value lookups, and the infinite scalability of Object Storage (S3) for unstructured files. Success is measured by balancing performance, cost, and durability.

Formula / Concept Box

Concept	Rule / Behavior
DynamoDB Query	Returns items based on Primary Key; highly efficient; searches only the relevant partition.
DynamoDB Scan	Examines every item in the table; extremely expensive and slow for large datasets.
S3 Standard	99.99% Availability; 11 9s Durability; no retrieval fees.
S3 IA	Lower storage cost; retrieval fee applies; 30-day minimum storage duration.
DynamoDB TTL	Automatically deletes expired items; no extra cost; doesn't consume WCU.

Hierarchical Outline

Amazon DynamoDB (NoSQL)
- Data Modeling: Use high-cardinality keys to avoid "hot partitions."
- Indexing: GSIs (can be created later) vs. LSIs (must be created at table creation).
- Consistency: Default is Eventual; Strong can be requested but doubles the RCU cost.
Amazon S3 (Object Storage)
- Storage Classes: Standard, IA (Infrequent Access), Intelligent-Tiering, and Glacier.
- Lifecycle Management: Automating transitions from Standard to Glacier based on age.
Amazon RDS (Relational)
- Engines: Aurora (Cloud-native), MySQL, PostgreSQL, etc.
- Security: Use AWS Secrets Manager for rotating database credentials instead of hardcoding environment variables.
Data Caching
- ElastiCache: Redis (complex data types, persistence) or Memcached (simple, multithreaded).

Visual Anchors

Data Store Selection Flow

Loading Diagram...

S3 Lifecycle Transition

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Definition-Example Pairs

Eventually Consistent Read:
- Definition: A read operation that may return stale data if performed immediately after a write.
- Example: A social media "like" count where seeing 99 likes instead of 100 for a few milliseconds is acceptable.
Strongly Consistent Read:
- Definition: A read operation that guarantees the most recent version of the data is returned.
- Example: An inventory management system checking if the last item in stock is actually available for purchase.
Hot Partition:
- Definition: A performance bottleneck caused by disproportionate traffic directed to a single partition key.
- Example: Using a "Status" field (Active/Inactive) as a Partition Key in a table with millions of items; almost all traffic hits the "Active" partition.

Worked Examples

Calculating Read Capacity Units (RCU)

Scenario: An application needs to perform 10 strongly consistent reads per second. Each item is 6 KB in size.

Item Size Calculation: Round up to the next 4 KB increment. $6 KB $\rightarrow 8$ KB$.
Units per Read: For Strong Consistency, $1 \text{ RCU} = 4 \text{ KB/sec} $. So, $8 KB / 4 KB = 2 RCUs per read$ .
Total RCU: $10 \text{ reads/sec} \times 2 \text{ RCUs/read} = 20 \text{ RCUs}$. Note: If this were Eventual Consistency, the answer would be half (10 RCUs).

Query vs. Scan Choice

Scenario: You have a Orders table with a Partition Key of CustomerID. You need to find all orders for Customer_123.

Method A (Scan): The database reads every single order in the table to check the ID. Result: High cost, high latency.
Method B (Query): The database goes directly to the Customer_123 partition. Result: Low cost, millisecond latency. Decision: Always use Query when the Partition Key is known.

Checkpoint Questions

Which DynamoDB operation consumes more RCU: a Query or a Scan when retrieving the same set of 10 items? Why?
A developer needs to store log files that are rarely accessed but must be retrievable within milliseconds. Which S3 storage class is best?
True or False: A Global Secondary Index (GSI) can have a different Partition Key than the base table.
What happens to the cost of a DynamoDB read if you change the ConsistentRead parameter from false to true?
How can AWS Secrets Manager improve the security of an RDS-connected application compared to using static environment variables?

▶Click to see Answers

Scan, because it reads every item in the table to find the matches, whereas Query only reads items in a specific partition.
S3 Standard-IA (Infrequent Access) provides millisecond retrieval for a lower storage price.
True.
The cost doubles (Strongly consistent reads consume twice as many RCUs as eventually consistent reads).
It allows for automated credential rotation and removes sensitive passwords from plain-text configuration files or environment variables.