Lab: Implementing Optimal Data Store Strategies on AWS

This hands-on lab guides you through selecting and configuring AWS storage services based on specific access patterns, performance requirements, and cost optimization goals. You will implement an S3-based data lake with lifecycle policies and a DynamoDB table for high-performance "hot" data access.

Prerequisites

Before starting this lab, ensure you have the following:

An AWS Account with AdministratorAccess or equivalent IAM permissions.
AWS CLI installed and configured with your credentials.
Basic familiarity with JSON syntax for policy definitions.
Region Selection: We recommend using us-east-1 (N. Virginia) for this lab.

Learning Objectives

By the end of this lab, you will be able to:

Differentiate between Hot and Cold data storage requirements.
Implement S3 Lifecycle Policies to automate cost optimization.
Provision and query an Amazon DynamoDB table for millisecond-latency workloads.
Compare use cases for Object (S3) and NoSQL (DynamoDB) storage.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create the Data Lake (S3)

You will create an S3 bucket to serve as your data lake. This represents a landing zone for large-scale object storage.

bash

# Generate a unique bucket name
BUCKET_NAME=brainybee-lab-$(date +%s)

# Create the bucket
aws s3 mb s3://$BUCKET_NAME --region us-east-1

▶Console alternative

Log in to the AWS Management Console.
Navigate to S3 > Create bucket.
Enter a unique name (e.g., brainybee-lab-unique-id).
Leave default settings and click Create bucket.

Step 2: Configure S3 Lifecycle Policies

To optimize costs for "Cold" data, you will create a policy that transitions objects to cheaper storage classes over time.

Create a file named lifecycle.json with the following content:

json

{
    "Rules": [
        {
            "ID": "MoveToArchive",
            "Status": "Enabled",
            "Filter": { "Prefix": "logs/" },
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 90,
                    "StorageClass": "GLACIER"
                }
            ]
        }
    ]
}

Apply the policy to your bucket:

bash

aws s3api put-bucket-lifecycle-configuration \
    --bucket $BUCKET_NAME \
    --lifecycle-configuration file://lifecycle.json

Step 3: Provision Hot Storage (DynamoDB)

For sub-millisecond access to structured data (e.g., user sessions or real-time orders), you will provision a DynamoDB table.

bash

aws dynamodb create-table \
    --table-name brainybee-hot-data \
    --attribute-definitions AttributeName=OrderID,AttributeType=S \
    --key-schema AttributeName=OrderID,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
    --region us-east-1

[!TIP] Using DynamoDB is optimal for "hot" data because it provides single-digit millisecond latency, whereas S3 is better for high-throughput batch analysis.

Checkpoints

S3 Verification: Run aws s3api get-bucket-lifecycle-configuration --bucket <YOUR_BUCKET_NAME>. You should see the MoveToArchive rule in JSON output.
DynamoDB Verification: Run aws dynamodb describe-table --table-name brainybee-hot-data. Ensure the TableStatus is ACTIVE.

Troubleshooting

Error	Cause	Fix
`BucketAlreadyExists`	S3 bucket names must be globally unique.	Change the bucket name prefix or add a random suffix.
`ResourceNotFound`	Table creation is asynchronous.	Wait 30 seconds for the DynamoDB table status to reach 'ACTIVE'.
`AccessDenied`	IAM User lacks permissions.	Ensure your user has `AmazonS3FullAccess` and `AmazonDynamoDBFullAccess`.

Cost Estimate

Amazon S3: Free tier covers 5GB and 20,000 GET requests. Transitioning to IA/Glacier incurs minimal storage costs (approx. $0.004/GB in Glacier).
Amazon DynamoDB: Free tier includes 25GB of storage and 25 WCU/RCU, which is enough for this lab.
Total Est. Cost: $0.00 (if within Free Tier limits).

Clean-Up / Teardown

[!WARNING] Failure to delete resources may result in unexpected AWS charges.

bash

# 1. Empty and Delete S3 Bucket
aws s3 rm s3://$BUCKET_NAME --recursive
aws s3 rb s3://$BUCKET_NAME

# 2. Delete DynamoDB Table
aws dynamodb delete-table --table-name brainybee-hot-data

Stretch Challenge

Scenario: Your data lake now receives data that is accessed unpredictably. Task: Modify the S3 bucket to use Intelligent-Tiering instead of a fixed lifecycle policy. Intelligent-Tiering automatically moves objects between Frequent, Infrequent, and Archive tiers based on actual access patterns.

Concept Review

Feature	Amazon S3	Amazon DynamoDB	Amazon RDS
Storage Type	Object	NoSQL Key-Value	Relational
Best For	Data Lakes, Cold Archive	Real-time apps, Hot data	Complex SQL, Transactions
Latency	Milliseconds/Seconds	Single-digit ms	Variable
Scalability	Virtually Unlimited	Seamless/Auto-scaling	Vertical/Read Replicas

Lab: Implementing Optimal Data Store Strategies on AWS

Prerequisites

Before starting this lab, ensure you have the following:

An AWS Account with AdministratorAccess or equivalent IAM permissions.
AWS CLI installed and configured with your credentials.
Basic familiarity with JSON syntax for policy definitions.
Region Selection: We recommend using us-east-1 (N. Virginia) for this lab.

Learning Objectives

By the end of this lab, you will be able to:

Differentiate between Hot and Cold data storage requirements.
Implement S3 Lifecycle Policies to automate cost optimization.
Provision and query an Amazon DynamoDB table for millisecond-latency workloads.
Compare use cases for Object (S3) and NoSQL (DynamoDB) storage.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create the Data Lake (S3)

You will create an S3 bucket to serve as your data lake. This represents a landing zone for large-scale object storage.

bash

# Generate a unique bucket name
BUCKET_NAME=brainybee-lab-$(date +%s)

# Create the bucket
aws s3 mb s3://$BUCKET_NAME --region us-east-1

▶Console alternative

Log in to the AWS Management Console.
Navigate to S3 > Create bucket.
Enter a unique name (e.g., brainybee-lab-unique-id).
Leave default settings and click Create bucket.

Step 2: Configure S3 Lifecycle Policies

To optimize costs for "Cold" data, you will create a policy that transitions objects to cheaper storage classes over time.

Create a file named lifecycle.json with the following content:

json

{
    "Rules": [
        {
            "ID": "MoveToArchive",
            "Status": "Enabled",
            "Filter": { "Prefix": "logs/" },
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 90,
                    "StorageClass": "GLACIER"
                }
            ]
        }
    ]
}

Apply the policy to your bucket:

bash

aws s3api put-bucket-lifecycle-configuration \
    --bucket $BUCKET_NAME \
    --lifecycle-configuration file://lifecycle.json

Step 3: Provision Hot Storage (DynamoDB)

For sub-millisecond access to structured data (e.g., user sessions or real-time orders), you will provision a DynamoDB table.

bash

aws dynamodb create-table \
    --table-name brainybee-hot-data \
    --attribute-definitions AttributeName=OrderID,AttributeType=S \
    --key-schema AttributeName=OrderID,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
    --region us-east-1

[!TIP] Using DynamoDB is optimal for "hot" data because it provides single-digit millisecond latency, whereas S3 is better for high-throughput batch analysis.

Checkpoints

S3 Verification: Run aws s3api get-bucket-lifecycle-configuration --bucket <YOUR_BUCKET_NAME>. You should see the MoveToArchive rule in JSON output.
DynamoDB Verification: Run aws dynamodb describe-table --table-name brainybee-hot-data. Ensure the TableStatus is ACTIVE.

Troubleshooting

Error	Cause	Fix
`BucketAlreadyExists`	S3 bucket names must be globally unique.	Change the bucket name prefix or add a random suffix.
`ResourceNotFound`	Table creation is asynchronous.	Wait 30 seconds for the DynamoDB table status to reach 'ACTIVE'.
`AccessDenied`	IAM User lacks permissions.	Ensure your user has `AmazonS3FullAccess` and `AmazonDynamoDBFullAccess`.

Cost Estimate

Amazon S3: Free tier covers 5GB and 20,000 GET requests. Transitioning to IA/Glacier incurs minimal storage costs (approx. $0.004/GB in Glacier).
Amazon DynamoDB: Free tier includes 25GB of storage and 25 WCU/RCU, which is enough for this lab.
Total Est. Cost: $0.00 (if within Free Tier limits).

Clean-Up / Teardown

[!WARNING] Failure to delete resources may result in unexpected AWS charges.

bash

# 1. Empty and Delete S3 Bucket
aws s3 rm s3://$BUCKET_NAME --recursive
aws s3 rb s3://$BUCKET_NAME

# 2. Delete DynamoDB Table
aws dynamodb delete-table --table-name brainybee-hot-data

Stretch Challenge

Concept Review

Feature	Amazon S3	Amazon DynamoDB	Amazon RDS
Storage Type	Object	NoSQL Key-Value	Relational
Best For	Data Lakes, Cold Archive	Real-time apps, Hot data	Complex SQL, Transactions
Latency	Milliseconds/Seconds	Single-digit ms	Variable
Scalability	Virtually Unlimited	Seamless/Auto-scaling	Vertical/Read Replicas