Hands-On Lab845 words

Lab: Implementing Optimal Data Store Strategies on AWS

Selecting Optimal Data Stores

Lab: Implementing Optimal Data Store Strategies on AWS

This hands-on lab guides you through selecting and configuring AWS storage services based on specific access patterns, performance requirements, and cost optimization goals. You will implement an S3-based data lake with lifecycle policies and a DynamoDB table for high-performance "hot" data access.

Prerequisites

Before starting this lab, ensure you have the following:

  • An AWS Account with AdministratorAccess or equivalent IAM permissions.
  • AWS CLI installed and configured with your credentials.
  • Basic familiarity with JSON syntax for policy definitions.
  • Region Selection: We recommend using us-east-1 (N. Virginia) for this lab.

Learning Objectives

By the end of this lab, you will be able to:

  • Differentiate between Hot and Cold data storage requirements.
  • Implement S3 Lifecycle Policies to automate cost optimization.
  • Provision and query an Amazon DynamoDB table for millisecond-latency workloads.
  • Compare use cases for Object (S3) and NoSQL (DynamoDB) storage.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create the Data Lake (S3)

You will create an S3 bucket to serve as your data lake. This represents a landing zone for large-scale object storage.

bash
# Generate a unique bucket name BUCKET_NAME=brainybee-lab-$(date +%s) # Create the bucket aws s3 mb s3://$BUCKET_NAME --region us-east-1
Console alternative
  1. Log in to the AWS Management Console.
  2. Navigate to S3 > Create bucket.
  3. Enter a unique name (e.g., brainybee-lab-unique-id).
  4. Leave default settings and click Create bucket.

Step 2: Configure S3 Lifecycle Policies

To optimize costs for "Cold" data, you will create a policy that transitions objects to cheaper storage classes over time.

  1. Create a file named lifecycle.json with the following content:
json
{ "Rules": [ { "ID": "MoveToArchive", "Status": "Enabled", "Filter": { "Prefix": "logs/" }, "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 90, "StorageClass": "GLACIER" } ] } ] }
  1. Apply the policy to your bucket:
bash
aws s3api put-bucket-lifecycle-configuration \ --bucket $BUCKET_NAME \ --lifecycle-configuration file://lifecycle.json

Step 3: Provision Hot Storage (DynamoDB)

For sub-millisecond access to structured data (e.g., user sessions or real-time orders), you will provision a DynamoDB table.

bash
aws dynamodb create-table \ --table-name brainybee-hot-data \ --attribute-definitions AttributeName=OrderID,AttributeType=S \ --key-schema AttributeName=OrderID,KeyType=HASH \ --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \ --region us-east-1

[!TIP] Using DynamoDB is optimal for "hot" data because it provides single-digit millisecond latency, whereas S3 is better for high-throughput batch analysis.

Checkpoints

  • S3 Verification: Run aws s3api get-bucket-lifecycle-configuration --bucket <YOUR_BUCKET_NAME>. You should see the MoveToArchive rule in JSON output.
  • DynamoDB Verification: Run aws dynamodb describe-table --table-name brainybee-hot-data. Ensure the TableStatus is ACTIVE.

Troubleshooting

ErrorCauseFix
BucketAlreadyExistsS3 bucket names must be globally unique.Change the bucket name prefix or add a random suffix.
ResourceNotFoundTable creation is asynchronous.Wait 30 seconds for the DynamoDB table status to reach 'ACTIVE'.
AccessDeniedIAM User lacks permissions.Ensure your user has AmazonS3FullAccess and AmazonDynamoDBFullAccess.

Cost Estimate

  • Amazon S3: Free tier covers 5GB and 20,000 GET requests. Transitioning to IA/Glacier incurs minimal storage costs (approx. $0.004/GB in Glacier).
  • Amazon DynamoDB: Free tier includes 25GB of storage and 25 WCU/RCU, which is enough for this lab.
  • Total Est. Cost: $0.00 (if within Free Tier limits).

Clean-Up / Teardown

[!WARNING] Failure to delete resources may result in unexpected AWS charges.

bash
# 1. Empty and Delete S3 Bucket aws s3 rm s3://$BUCKET_NAME --recursive aws s3 rb s3://$BUCKET_NAME # 2. Delete DynamoDB Table aws dynamodb delete-table --table-name brainybee-hot-data

Stretch Challenge

Scenario: Your data lake now receives data that is accessed unpredictably. Task: Modify the S3 bucket to use Intelligent-Tiering instead of a fixed lifecycle policy. Intelligent-Tiering automatically moves objects between Frequent, Infrequent, and Archive tiers based on actual access patterns.

Concept Review

FeatureAmazon S3Amazon DynamoDBAmazon RDS
Storage TypeObjectNoSQL Key-ValueRelational
Best ForData Lakes, Cold ArchiveReal-time apps, Hot dataComplex SQL, Transactions
LatencyMilliseconds/SecondsSingle-digit msVariable
ScalabilityVirtually UnlimitedSeamless/Auto-scalingVertical/Read Replicas

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free