Lab: Building a Scalable Data Store with Amazon DynamoDB and S3
Use data stores in application development
Lab: Building a Scalable Data Store with Amazon DynamoDB and S3
This lab provides hands-on experience in implementing data stores for application development, a core requirement for the AWS Certified Developer - Associate (DVA-C02) exam. You will configure a DynamoDB table, explore the performance differences between Query and Scan operations, and utilize S3 for object storage.
[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges. While these services are Free Tier eligible, costs can accrue if resources are left running.
Prerequisites
- An active AWS Account.
- AWS CLI installed and configured with appropriate permissions (
AdministratorAccessrecommended for lab environments). - Basic familiarity with JSON and command-line interfaces.
<YOUR_REGION>: Use a consistent region throughout (e.g.,us-east-1).
Learning Objectives
- Create and configure an Amazon DynamoDB table with optimized Partition and Sort keys.
- Differentiate between Query and Scan operations in a live environment.
- Implement Amazon S3 for static data storage and retrieval.
- Understand the impact of consistency models (Eventually vs. Strongly Consistent).
Architecture Overview
Step-by-Step Instructions
Step 1: Create an Amazon S3 Bucket for Application Assets
In this step, you will create an S3 bucket to store application-related metadata or static assets, mimicking a real-world serverless frontend storage.
# Replace <UNIQUE_SUFFIX> with your name or a random string
aws s3 mb s3://brainybee-lab-assets-<UNIQUE_SUFFIX> --region <YOUR_REGION>▶Console alternative
- Navigate to S3 in the AWS Console.
- Click Create bucket.
- Enter a unique name:
brainybee-lab-assets-<UNIQUE_SUFFIX>. - Select your preferred Region and click Create bucket (keep default settings).
Step 2: Create a DynamoDB Table for "Todo" Tasks
We will create a table for a Todo application. To ensure high-cardinality and efficient access, we will use UserId as the Partition Key and TaskId as the Sort Key.
aws dynamodb create-table \
--table-name TodoTable \
--attribute-definitions \
AttributeName=UserId,AttributeType=S \
AttributeName=TaskId,AttributeType=S \
--key-schema \
AttributeName=UserId,KeyType=HASH \
AttributeName=TaskId,KeyType=RANGE \
--provisioned-throughput \
ReadCapacityUnits=5,WriteCapacityUnits=5 \
--region <YOUR_REGION>[!TIP] Choosing a high-cardinality Partition Key (like
UserId) ensures that data is distributed evenly across multiple physical partitions, preventing "hot partitions."
Step 3: Populate the Table (Data Serialization)
We will insert a few items into the table using the put-item command. Note how we specify the data types (S for String).
aws dynamodb put-item \
--table-name TodoTable \
--item '{"UserId": {"S": "user_123"}, "TaskId": {"S": "T-001"}, "TaskName": {"S": "Complete AWS Lab"}, "Status": {"S": "In-Progress"}}' \
--region <YOUR_REGION>
aws dynamodb put-item \
--table-name TodoTable \
--item '{"UserId": {"S": "user_123"}, "TaskId": {"S": "T-002"}, "TaskName": {"S": "Prepare for DVA-C02"}, "Status": {"S": "Pending"}}' \
--region <YOUR_REGION>Step 4: Compare Query vs. Scan
A Query finds items based on primary key values, while a Scan examines every item in the table.
Perform a Query (Efficient):
aws dynamodb query \
--table-name TodoTable \
--key-condition-expression "UserId = :v1" \
--expression-attribute-values '{":v1": {"S": "user_123"}}' \
--region <YOUR_REGION>Perform a Scan (Expensive):
aws dynamodb scan \
--table-name TodoTable \
--region <YOUR_REGION>Checkpoints
- S3 Check: Run
aws s3 ls. Do you see your bucket listed? - DynamoDB Check: Run
aws dynamodb describe-table --table-name TodoTable. Is theTableStatusmarked asACTIVE? - Consistency Check: In your Query output, note the
ScannedCountvsCount. In a Query for a specific user, these should be low. In a Scan,ScannedCountwill equal the total items in the table.
Teardown
To avoid costs, delete all resources created during this lab.
# 1. Delete the DynamoDB Table
aws dynamodb delete-table --table-name TodoTable --region <YOUR_REGION>
# 2. Empty and Delete the S3 Bucket
aws s3 rm s3://brainybee-lab-assets-<UNIQUE_SUFFIX> --recursive
aws s3 rb s3://brainybee-lab-assets-<UNIQUE_SUFFIX>Troubleshooting
| Error | Likely Cause | Fix |
|---|---|---|
ResourceNotFoundException | Table/Bucket is in a different region. | Add --region <YOUR_REGION> explicitly to the command. |
AccessDenied | IAM User lacks DynamoDB/S3 permissions. | Attach AmazonDynamoDBFullAccess or check IAM policies. |
ValidationException | Incorrect JSON syntax in the --item flag. | Ensure quotes are escaped correctly or use a JSON file. |
Stretch Challenge
Objective: Implement a Global Secondary Index (GSI).
Currently, you can only efficiently search by UserId. Add a GSI to the TodoTable that allows you to search for tasks by Status.
▶Show Hint
Use the update-table command with --attribute-definitions and --global-secondary-index-updates. This allows you to perform Query operations on non-key attributes.
Cost Estimate
- Amazon S3: Standard storage is $0.023 per GB (First 5GB/month free). Lab usage: $0.00.
- Amazon DynamoDB: 25 GB of storage and 25 WCU/RCU are free. Lab usage: $0.00.
- Total Estimated Lab Cost: $0.00 (within Free Tier limits).
Concept Review
Data Store Comparison Table
| Feature | Amazon S3 | Amazon DynamoDB | Amazon RDS |
|---|---|---|---|
| Type | Object Storage | NoSQL (Key-Value/Document) | Relational (SQL) |
| Best Use Case | Static files, backups, logs | High-speed, high-scale apps | Complex joins, transactions |
| Scalability | Virtually unlimited | Provisioned or On-Demand | Vertical & Horizontal (Read Replicas) |
| Consistency | Strong (since late 2020) | Eventual (Default) / Strong | Strong |
Key DVA-C02 Concept: Consistency Models
\begin{tikzpicture}[node distance=2cm] \node (start) [draw, rectangle] {Read Request}; \node (eventual) [draw, rounded corners, right of=start, xshift=3cm] {\textbf{Eventual Consistency}}; \node (strong) [draw, rounded corners, below of=eventual, yshift=-1cm] {\textbf{Strong Consistency}};
\draw [->] (start) -- node[anchor=south] {Default} (eventual); \draw [->] (start) |- node[anchor=west, yshift=0.5cm] {ConsistentRead=true} (strong);
\node [below of=eventual, xshift=1.5cm, yshift=1.2cm, text width=4cm] {\tiny Potential stale data, lower cost.}; \node [below of=strong, xshift=1.5cm, yshift=1.2cm, text width=4cm] {\tiny Most recent data, double RCU cost.}; \end{tikzpicture}
By default, DynamoDB uses Eventually Consistent Reads. If your application requires the absolute latest data immediately after a write, you must set ConsistentRead to true, which consumes twice the Read Capacity Units (RCUs).