Lab: Implementing Optimal Data Store Strategies on AWS
Selecting Optimal Data Stores
Lab: Implementing Optimal Data Store Strategies on AWS
This hands-on lab guides you through selecting and configuring AWS storage services based on specific access patterns, performance requirements, and cost optimization goals. You will implement an S3-based data lake with lifecycle policies and a DynamoDB table for high-performance "hot" data access.
Prerequisites
Before starting this lab, ensure you have the following:
- An AWS Account with AdministratorAccess or equivalent IAM permissions.
- AWS CLI installed and configured with your credentials.
- Basic familiarity with JSON syntax for policy definitions.
- Region Selection: We recommend using
us-east-1(N. Virginia) for this lab.
Learning Objectives
By the end of this lab, you will be able to:
- Differentiate between Hot and Cold data storage requirements.
- Implement S3 Lifecycle Policies to automate cost optimization.
- Provision and query an Amazon DynamoDB table for millisecond-latency workloads.
- Compare use cases for Object (S3) and NoSQL (DynamoDB) storage.
Architecture Overview
Step-by-Step Instructions
Step 1: Create the Data Lake (S3)
You will create an S3 bucket to serve as your data lake. This represents a landing zone for large-scale object storage.
# Generate a unique bucket name
BUCKET_NAME=brainybee-lab-$(date +%s)
# Create the bucket
aws s3 mb s3://$BUCKET_NAME --region us-east-1▶Console alternative
- Log in to the AWS Management Console.
- Navigate to S3 > Create bucket.
- Enter a unique name (e.g.,
brainybee-lab-unique-id). - Leave default settings and click Create bucket.
Step 2: Configure S3 Lifecycle Policies
To optimize costs for "Cold" data, you will create a policy that transitions objects to cheaper storage classes over time.
- Create a file named
lifecycle.jsonwith the following content:
{
"Rules": [
{
"ID": "MoveToArchive",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
}
]
}
]
}- Apply the policy to your bucket:
aws s3api put-bucket-lifecycle-configuration \
--bucket $BUCKET_NAME \
--lifecycle-configuration file://lifecycle.jsonStep 3: Provision Hot Storage (DynamoDB)
For sub-millisecond access to structured data (e.g., user sessions or real-time orders), you will provision a DynamoDB table.
aws dynamodb create-table \
--table-name brainybee-hot-data \
--attribute-definitions AttributeName=OrderID,AttributeType=S \
--key-schema AttributeName=OrderID,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--region us-east-1[!TIP] Using DynamoDB is optimal for "hot" data because it provides single-digit millisecond latency, whereas S3 is better for high-throughput batch analysis.
Checkpoints
- S3 Verification: Run
aws s3api get-bucket-lifecycle-configuration --bucket <YOUR_BUCKET_NAME>. You should see theMoveToArchiverule in JSON output. - DynamoDB Verification: Run
aws dynamodb describe-table --table-name brainybee-hot-data. Ensure theTableStatusisACTIVE.
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
BucketAlreadyExists | S3 bucket names must be globally unique. | Change the bucket name prefix or add a random suffix. |
ResourceNotFound | Table creation is asynchronous. | Wait 30 seconds for the DynamoDB table status to reach 'ACTIVE'. |
AccessDenied | IAM User lacks permissions. | Ensure your user has AmazonS3FullAccess and AmazonDynamoDBFullAccess. |
Cost Estimate
- Amazon S3: Free tier covers 5GB and 20,000 GET requests. Transitioning to IA/Glacier incurs minimal storage costs (approx. $0.004/GB in Glacier).
- Amazon DynamoDB: Free tier includes 25GB of storage and 25 WCU/RCU, which is enough for this lab.
- Total Est. Cost: $0.00 (if within Free Tier limits).
Clean-Up / Teardown
[!WARNING] Failure to delete resources may result in unexpected AWS charges.
# 1. Empty and Delete S3 Bucket
aws s3 rm s3://$BUCKET_NAME --recursive
aws s3 rb s3://$BUCKET_NAME
# 2. Delete DynamoDB Table
aws dynamodb delete-table --table-name brainybee-hot-dataStretch Challenge
Scenario: Your data lake now receives data that is accessed unpredictably. Task: Modify the S3 bucket to use Intelligent-Tiering instead of a fixed lifecycle policy. Intelligent-Tiering automatically moves objects between Frequent, Infrequent, and Archive tiers based on actual access patterns.
Concept Review
| Feature | Amazon S3 | Amazon DynamoDB | Amazon RDS |
|---|---|---|---|
| Storage Type | Object | NoSQL Key-Value | Relational |
| Best For | Data Lakes, Cold Archive | Real-time apps, Hot data | Complex SQL, Transactions |
| Latency | Milliseconds/Seconds | Single-digit ms | Variable |
| Scalability | Virtually Unlimited | Seamless/Auto-scaling | Vertical/Read Replicas |