Lab: Building a Scalable Data Store with Amazon DynamoDB and S3

This lab provides hands-on experience in implementing data stores for application development, a core requirement for the AWS Certified Developer - Associate (DVA-C02) exam. You will configure a DynamoDB table, explore the performance differences between Query and Scan operations, and utilize S3 for object storage.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges. While these services are Free Tier eligible, costs can accrue if resources are left running.

Prerequisites

An active AWS Account.
AWS CLI installed and configured with appropriate permissions (AdministratorAccess recommended for lab environments).
Basic familiarity with JSON and command-line interfaces.
<YOUR_REGION>: Use a consistent region throughout (e.g., us-east-1).

Learning Objectives

Create and configure an Amazon DynamoDB table with optimized Partition and Sort keys.
Differentiate between Query and Scan operations in a live environment.
Implement Amazon S3 for static data storage and retrieval.
Understand the impact of consistency models (Eventually vs. Strongly Consistent).

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an Amazon S3 Bucket for Application Assets

In this step, you will create an S3 bucket to store application-related metadata or static assets, mimicking a real-world serverless frontend storage.

bash

# Replace <UNIQUE_SUFFIX> with your name or a random string
aws s3 mb s3://brainybee-lab-assets-<UNIQUE_SUFFIX> --region <YOUR_REGION>

▶Console alternative

Navigate to S3 in the AWS Console.
Click Create bucket.
Enter a unique name: brainybee-lab-assets-<UNIQUE_SUFFIX>.
Select your preferred Region and click Create bucket (keep default settings).

Step 2: Create a DynamoDB Table for "Todo" Tasks

We will create a table for a Todo application. To ensure high-cardinality and efficient access, we will use UserId as the Partition Key and TaskId as the Sort Key.

bash

aws dynamodb create-table \
    --table-name TodoTable \
    --attribute-definitions \
        AttributeName=UserId,AttributeType=S \
        AttributeName=TaskId,AttributeType=S \
    --key-schema \
        AttributeName=UserId,KeyType=HASH \
        AttributeName=TaskId,KeyType=RANGE \
    --provisioned-throughput \
        ReadCapacityUnits=5,WriteCapacityUnits=5 \
    --region <YOUR_REGION>

[!TIP] Choosing a high-cardinality Partition Key (like UserId) ensures that data is distributed evenly across multiple physical partitions, preventing "hot partitions."

Step 3: Populate the Table (Data Serialization)

We will insert a few items into the table using the put-item command. Note how we specify the data types (S for String).

bash

aws dynamodb put-item \
    --table-name TodoTable \
    --item '{"UserId": {"S": "user_123"}, "TaskId": {"S": "T-001"}, "TaskName": {"S": "Complete AWS Lab"}, "Status": {"S": "In-Progress"}}' \
    --region <YOUR_REGION>

aws dynamodb put-item \
    --table-name TodoTable \
    --item '{"UserId": {"S": "user_123"}, "TaskId": {"S": "T-002"}, "TaskName": {"S": "Prepare for DVA-C02"}, "Status": {"S": "Pending"}}' \
    --region <YOUR_REGION>

Step 4: Compare Query vs. Scan

A Query finds items based on primary key values, while a Scan examines every item in the table.

Perform a Query (Efficient):

bash

aws dynamodb query \
    --table-name TodoTable \
    --key-condition-expression "UserId = :v1" \
    --expression-attribute-values '{":v1": {"S": "user_123"}}' \
    --region <YOUR_REGION>

Perform a Scan (Expensive):

bash

aws dynamodb scan \
    --table-name TodoTable \
    --region <YOUR_REGION>

Checkpoints

S3 Check: Run aws s3 ls. Do you see your bucket listed?
DynamoDB Check: Run aws dynamodb describe-table --table-name TodoTable. Is the TableStatus marked as ACTIVE?
Consistency Check: In your Query output, note the ScannedCount vs Count. In a Query for a specific user, these should be low. In a Scan, ScannedCount will equal the total items in the table.

Teardown

To avoid costs, delete all resources created during this lab.

bash

# 1. Delete the DynamoDB Table
aws dynamodb delete-table --table-name TodoTable --region <YOUR_REGION>

# 2. Empty and Delete the S3 Bucket
aws s3 rm s3://brainybee-lab-assets-<UNIQUE_SUFFIX> --recursive
aws s3 rb s3://brainybee-lab-assets-<UNIQUE_SUFFIX>

Troubleshooting

Error	Likely Cause	Fix
`ResourceNotFoundException`	Table/Bucket is in a different region.	Add `--region <YOUR_REGION>` explicitly to the command.
`AccessDenied`	IAM User lacks DynamoDB/S3 permissions.	Attach `AmazonDynamoDBFullAccess` or check IAM policies.
`ValidationException`	Incorrect JSON syntax in the `--item` flag.	Ensure quotes are escaped correctly or use a JSON file.

Stretch Challenge

Objective: Implement a Global Secondary Index (GSI).

Currently, you can only efficiently search by UserId. Add a GSI to the TodoTable that allows you to search for tasks by Status.

▶Show Hint

Use the update-table command with --attribute-definitions and --global-secondary-index-updates. This allows you to perform Query operations on non-key attributes.

Cost Estimate

Amazon S3: Standard storage is $0.023 per GB (First 5GB/month free). Lab usage: $0.00.
Amazon DynamoDB: 25 GB of storage and 25 WCU/RCU are free. Lab usage: $0.00.
Total Estimated Lab Cost: $0.00 (within Free Tier limits).

Concept Review

Data Store Comparison Table

Feature	Amazon S3	Amazon DynamoDB	Amazon RDS
Type	Object Storage	NoSQL (Key-Value/Document)	Relational (SQL)
Best Use Case	Static files, backups, logs	High-speed, high-scale apps	Complex joins, transactions
Scalability	Virtually unlimited	Provisioned or On-Demand	Vertical & Horizontal (Read Replicas)
Consistency	Strong (since late 2020)	Eventual (Default) / Strong	Strong

Key DVA-C02 Concept: Consistency Models

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

By default, DynamoDB uses Eventually Consistent Reads. If your application requires the absolute latest data immediately after a write, you must set ConsistentRead to true, which consumes twice the Read Capacity Units (RCUs).

Lab: Building a Scalable Data Store with Amazon DynamoDB and S3

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges. While these services are Free Tier eligible, costs can accrue if resources are left running.

Prerequisites

An active AWS Account.
AWS CLI installed and configured with appropriate permissions (AdministratorAccess recommended for lab environments).
Basic familiarity with JSON and command-line interfaces.
<YOUR_REGION>: Use a consistent region throughout (e.g., us-east-1).

Learning Objectives

Create and configure an Amazon DynamoDB table with optimized Partition and Sort keys.
Differentiate between Query and Scan operations in a live environment.
Implement Amazon S3 for static data storage and retrieval.
Understand the impact of consistency models (Eventually vs. Strongly Consistent).

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an Amazon S3 Bucket for Application Assets

In this step, you will create an S3 bucket to store application-related metadata or static assets, mimicking a real-world serverless frontend storage.

bash

# Replace <UNIQUE_SUFFIX> with your name or a random string
aws s3 mb s3://brainybee-lab-assets-<UNIQUE_SUFFIX> --region <YOUR_REGION>

▶Console alternative

Navigate to S3 in the AWS Console.
Click Create bucket.
Enter a unique name: brainybee-lab-assets-<UNIQUE_SUFFIX>.
Select your preferred Region and click Create bucket (keep default settings).

Step 2: Create a DynamoDB Table for "Todo" Tasks

We will create a table for a Todo application. To ensure high-cardinality and efficient access, we will use UserId as the Partition Key and TaskId as the Sort Key.

bash

aws dynamodb create-table \
    --table-name TodoTable \
    --attribute-definitions \
        AttributeName=UserId,AttributeType=S \
        AttributeName=TaskId,AttributeType=S \
    --key-schema \
        AttributeName=UserId,KeyType=HASH \
        AttributeName=TaskId,KeyType=RANGE \
    --provisioned-throughput \
        ReadCapacityUnits=5,WriteCapacityUnits=5 \
    --region <YOUR_REGION>

[!TIP] Choosing a high-cardinality Partition Key (like UserId) ensures that data is distributed evenly across multiple physical partitions, preventing "hot partitions."

Step 3: Populate the Table (Data Serialization)

We will insert a few items into the table using the put-item command. Note how we specify the data types (S for String).

bash

aws dynamodb put-item \
    --table-name TodoTable \
    --item '{"UserId": {"S": "user_123"}, "TaskId": {"S": "T-001"}, "TaskName": {"S": "Complete AWS Lab"}, "Status": {"S": "In-Progress"}}' \
    --region <YOUR_REGION>

aws dynamodb put-item \
    --table-name TodoTable \
    --item '{"UserId": {"S": "user_123"}, "TaskId": {"S": "T-002"}, "TaskName": {"S": "Prepare for DVA-C02"}, "Status": {"S": "Pending"}}' \
    --region <YOUR_REGION>

Step 4: Compare Query vs. Scan

A Query finds items based on primary key values, while a Scan examines every item in the table.

Perform a Query (Efficient):

bash

aws dynamodb query \
    --table-name TodoTable \
    --key-condition-expression "UserId = :v1" \
    --expression-attribute-values '{":v1": {"S": "user_123"}}' \
    --region <YOUR_REGION>

Perform a Scan (Expensive):

bash

aws dynamodb scan \
    --table-name TodoTable \
    --region <YOUR_REGION>

Checkpoints

S3 Check: Run aws s3 ls. Do you see your bucket listed?
DynamoDB Check: Run aws dynamodb describe-table --table-name TodoTable. Is the TableStatus marked as ACTIVE?
Consistency Check: In your Query output, note the ScannedCount vs Count. In a Query for a specific user, these should be low. In a Scan, ScannedCount will equal the total items in the table.

Teardown

To avoid costs, delete all resources created during this lab.

bash

# 1. Delete the DynamoDB Table
aws dynamodb delete-table --table-name TodoTable --region <YOUR_REGION>

# 2. Empty and Delete the S3 Bucket
aws s3 rm s3://brainybee-lab-assets-<UNIQUE_SUFFIX> --recursive
aws s3 rb s3://brainybee-lab-assets-<UNIQUE_SUFFIX>

Troubleshooting

Error	Likely Cause	Fix
`ResourceNotFoundException`	Table/Bucket is in a different region.	Add `--region <YOUR_REGION>` explicitly to the command.
`AccessDenied`	IAM User lacks DynamoDB/S3 permissions.	Attach `AmazonDynamoDBFullAccess` or check IAM policies.
`ValidationException`	Incorrect JSON syntax in the `--item` flag.	Ensure quotes are escaped correctly or use a JSON file.

Stretch Challenge

Objective: Implement a Global Secondary Index (GSI).

Currently, you can only efficiently search by UserId. Add a GSI to the TodoTable that allows you to search for tasks by Status.

▶Show Hint

Use the update-table command with --attribute-definitions and --global-secondary-index-updates. This allows you to perform Query operations on non-key attributes.

Cost Estimate

Amazon S3: Standard storage is $0.023 per GB (First 5GB/month free). Lab usage: $0.00.
Amazon DynamoDB: 25 GB of storage and 25 WCU/RCU are free. Lab usage: $0.00.
Total Estimated Lab Cost: $0.00 (within Free Tier limits).

Concept Review

Data Store Comparison Table

Feature	Amazon S3	Amazon DynamoDB	Amazon RDS
Type	Object Storage	NoSQL (Key-Value/Document)	Relational (SQL)
Best Use Case	Static files, backups, logs	High-speed, high-scale apps	Complex joins, transactions
Scalability	Virtually unlimited	Provisioned or On-Demand	Vertical & Horizontal (Read Replicas)
Consistency	Strong (since late 2020)	Eventual (Default) / Strong	Strong

Key DVA-C02 Concept: Consistency Models

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds