Hands-On Lab: Implementing Automated Data Lifecycle Management on AWS
Data Lifecycle Management
Hands-On Lab: Implementing Automated Data Lifecycle Management on AWS
In this lab, you will act as a Data Engineer implementing a Data Lifecycle Management (DLM) strategy to balance cost-optimization with regulatory compliance. You will configure Amazon S3 to automatically transition data across storage tiers and implement Time-to-Live (TTL) on Amazon DynamoDB to purge stale records.
[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for provisioned resources.
Prerequisites
- AWS Account: Access to an AWS account with
AdministratorAccessor equivalent permissions. - AWS CLI: Installed and configured with
aws configureusing your credentials. - Region: We will use
us-east-1(N. Virginia) for this lab. - Knowledge: Basic understanding of S3 buckets and NoSQL databases.
Learning Objectives
- Configure S3 Versioning to protect against accidental deletions.
- Create S3 Lifecycle Policies to automate transitions from Standard to Standard-IA and Glacier.
- Implement DynamoDB TTL to manage the lifecycle of high-velocity transactional data.
- Verify lifecycle transitions using the AWS CLI and Management Console.
Architecture Overview
\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, minimum width=3cm, minimum height=1cm, align=center}] \draw[thick, ->] (0,0) -- (10,0) node[right] {Time (Days)}; \foreach \x in {0, 3, 6, 9} \draw (\x, 0.1) -- (\x, -0.1) node[below] {\x 0}; \node[fill=orange!20] at (1.5, 1) {S3 Standard}; \node[fill=blue!20] at (4.5, 1) {S3 IA}; \node[fill=gray!20] at (7.5, 1) {S3 Glacier}; \node[fill=red!20] at (10, 1) {Deleted}; \end{tikzpicture}
Step-by-Step Instructions
Step 1: Initialize the S3 Storage Environment
First, we create a bucket that will act as our primary data store.
# Generate a unique suffix for your bucket
RANDOM_ID=$RANDOM
BUCKET_NAME="brainybee-dlm-lab-$RANDOM_ID"
# Create the bucket
aws s3api create-bucket --bucket $BUCKET_NAME --region us-east-1▶Console alternative
Navigate to
. Name it
brainybee-dlm-lab-[unique-id]and keep all other settings at default.
Step 2: Enable S3 Versioning
Versioning is a prerequisite for robust DLM, allowing you to recover from accidental overwrites or deletes.
aws s3api put-bucket-versioning --bucket $BUCKET_NAME --versioning-configuration Status=EnabledStep 3: Define and Apply Lifecycle Rules
We will create a JSON configuration that defines the transitions shown in our architecture diagram.
- Save the following content as
lifecycle.json:
{
"Rules": [
{
"ID": "MoveOldDataToArchive",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER_IR"
}
],
"Expiration": {
"Days": 365
}
}
]
}- Apply the policy to your bucket:
aws s3api put-bucket-lifecycle-configuration --bucket $BUCKET_NAME --lifecycle-configuration file://lifecycle.jsonStep 4: Implement DynamoDB TTL
For high-velocity data that loses value quickly, we use DynamoDB TTL to expire items based on a timestamp.
- Create a table:
aws dynamodb create-table \
--table-name LogData \
--attribute-definitions AttributeName=LogID,AttributeType=S \
--key-schema AttributeName=LogID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST- Enable TTL on the
ExpiryTimeattribute:
aws dynamodb update-time-to-live \
--table-name LogData \
--time-to-live-specification "Enabled=true, AttributeName=ExpiryTime"Checkpoints
| Checkpoint | Command / Action | Expected Result |
|---|---|---|
| S3 Configuration | aws s3api get-bucket-lifecycle-configuration --bucket <YOUR_BUCKET> | JSON output showing transition to GLACIER_IR after 90 days. |
| Versioning | aws s3api get-bucket-versioning --bucket <YOUR_BUCKET> | Status should be Enabled. |
| DynamoDB TTL | aws dynamodb describe-time-to-live --table-name LogData | TimeToLiveStatus should be ENABLED or ENABLING. |
Troubleshooting
| Issue | Possible Cause | Fix |
|---|---|---|
BucketAlreadyExists | S3 bucket names are globally unique. | Change the $RANDOM_ID in Step 1 to a different value. |
| TTL not deleting data | TTL is not instantaneous. | AWS typically deletes expired items within 48 hours of expiration. |
| Lifecycle rule didn't trigger | Minimum storage duration. | Some transitions have minimum storage durations (e.g., S3 IA requires 30 days in Standard). |
Clean-Up / Teardown
[!IMPORTANT] To avoid unexpected costs, delete these resources immediately after finishing.
# 1. Empty the S3 bucket (required before deletion)
# Note: Versioned buckets require deleting all versions
aws s3 rm s3://$BUCKET_NAME --recursive
# 2. Delete the bucket
aws s3api delete-bucket --bucket $BUCKET_NAME
# 3. Delete the DynamoDB table
aws dynamodb delete-table --table-name LogDataCost Estimate
- S3 Standard: $0.023 per GB (First 50 TB).
- S3 Glacier Instant Retrieval: $0.004 per GB (Approx. 80% cheaper than Standard).
- DynamoDB TTL: FREE. DynamoDB does not charge for the deletion of items via TTL.
- Overall Lab Cost: If using <1GB of data, this lab stays within the AWS Free Tier.
Stretch Challenge
Scenario: A healthcare client requires that logs in the legal/ prefix cannot be deleted for 7 years due to HIPAA compliance, even by an administrator.
Task: Research and implement S3 Object Lock in Compliance Mode for a specific prefix. How does this differ from standard lifecycle expiration?
▶Click for Solution Hint
S3 Object Lock uses a "Write Once, Read Many" (WORM) model. In Compliance mode, a protected object version cannot be overwritten or deleted by any user, including the root user, until the retention period expires.
Concept Review
| Feature | Best For... | Key Advantage |
|---|---|---|
| S3 Lifecycle | Large-scale object storage | Cost optimization by moving data to cheaper tiers automatically. |
| DynamoDB TTL | Session data, temporary logs | Automatic cleanup without consuming Write Capacity Units (WCU). |
| Object Lock | Compliance/Legal (HIPAA, GDPR) | Ensures data immutability and prevents accidental or malicious deletion. |
| Versioning | Recovery & Audit | Allows restoration of previous states and protection against Delete calls. |