Hands-On Lab945 words

Hands-On Lab: Implementing Automated Data Lifecycle Management on AWS

Data Lifecycle Management

Hands-On Lab: Implementing Automated Data Lifecycle Management on AWS

In this lab, you will act as a Data Engineer implementing a Data Lifecycle Management (DLM) strategy to balance cost-optimization with regulatory compliance. You will configure Amazon S3 to automatically transition data across storage tiers and implement Time-to-Live (TTL) on Amazon DynamoDB to purge stale records.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for provisioned resources.

Prerequisites

  • AWS Account: Access to an AWS account with AdministratorAccess or equivalent permissions.
  • AWS CLI: Installed and configured with aws configure using your credentials.
  • Region: We will use us-east-1 (N. Virginia) for this lab.
  • Knowledge: Basic understanding of S3 buckets and NoSQL databases.

Learning Objectives

  • Configure S3 Versioning to protect against accidental deletions.
  • Create S3 Lifecycle Policies to automate transitions from Standard to Standard-IA and Glacier.
  • Implement DynamoDB TTL to manage the lifecycle of high-velocity transactional data.
  • Verify lifecycle transitions using the AWS CLI and Management Console.

Architecture Overview

Loading Diagram...

\begin{tikzpicture}[node distance=2cm, every node/.style={rectangle, draw, minimum width=3cm, minimum height=1cm, align=center}] \draw[thick, ->] (0,0) -- (10,0) node[right] {Time (Days)}; \foreach \x in {0, 3, 6, 9} \draw (\x, 0.1) -- (\x, -0.1) node[below] {\x 0}; \node[fill=orange!20] at (1.5, 1) {S3 Standard}; \node[fill=blue!20] at (4.5, 1) {S3 IA}; \node[fill=gray!20] at (7.5, 1) {S3 Glacier}; \node[fill=red!20] at (10, 1) {Deleted}; \end{tikzpicture}

Step-by-Step Instructions

Step 1: Initialize the S3 Storage Environment

First, we create a bucket that will act as our primary data store.

bash
# Generate a unique suffix for your bucket RANDOM_ID=$RANDOM BUCKET_NAME="brainybee-dlm-lab-$RANDOM_ID" # Create the bucket aws s3api create-bucket --bucket $BUCKET_NAME --region us-east-1
Console alternative

Navigate to

S3
Create bucket

. Name it

brainybee-dlm-lab-[unique-id]

and keep all other settings at default.

Step 2: Enable S3 Versioning

Versioning is a prerequisite for robust DLM, allowing you to recover from accidental overwrites or deletes.

bash
aws s3api put-bucket-versioning --bucket $BUCKET_NAME --versioning-configuration Status=Enabled

Step 3: Define and Apply Lifecycle Rules

We will create a JSON configuration that defines the transitions shown in our architecture diagram.

  1. Save the following content as lifecycle.json:
json
{ "Rules": [ { "ID": "MoveOldDataToArchive", "Status": "Enabled", "Filter": { "Prefix": "logs/" }, "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 90, "StorageClass": "GLACIER_IR" } ], "Expiration": { "Days": 365 } } ] }
  1. Apply the policy to your bucket:
bash
aws s3api put-bucket-lifecycle-configuration --bucket $BUCKET_NAME --lifecycle-configuration file://lifecycle.json

Step 4: Implement DynamoDB TTL

For high-velocity data that loses value quickly, we use DynamoDB TTL to expire items based on a timestamp.

  1. Create a table:
bash
aws dynamodb create-table \ --table-name LogData \ --attribute-definitions AttributeName=LogID,AttributeType=S \ --key-schema AttributeName=LogID,KeyType=HASH \ --billing-mode PAY_PER_REQUEST
  1. Enable TTL on the ExpiryTime attribute:
bash
aws dynamodb update-time-to-live \ --table-name LogData \ --time-to-live-specification "Enabled=true, AttributeName=ExpiryTime"

Checkpoints

CheckpointCommand / ActionExpected Result
S3 Configurationaws s3api get-bucket-lifecycle-configuration --bucket <YOUR_BUCKET>JSON output showing transition to GLACIER_IR after 90 days.
Versioningaws s3api get-bucket-versioning --bucket <YOUR_BUCKET>Status should be Enabled.
DynamoDB TTLaws dynamodb describe-time-to-live --table-name LogDataTimeToLiveStatus should be ENABLED or ENABLING.

Troubleshooting

IssuePossible CauseFix
BucketAlreadyExistsS3 bucket names are globally unique.Change the $RANDOM_ID in Step 1 to a different value.
TTL not deleting dataTTL is not instantaneous.AWS typically deletes expired items within 48 hours of expiration.
Lifecycle rule didn't triggerMinimum storage duration.Some transitions have minimum storage durations (e.g., S3 IA requires 30 days in Standard).

Clean-Up / Teardown

[!IMPORTANT] To avoid unexpected costs, delete these resources immediately after finishing.

bash
# 1. Empty the S3 bucket (required before deletion) # Note: Versioned buckets require deleting all versions aws s3 rm s3://$BUCKET_NAME --recursive # 2. Delete the bucket aws s3api delete-bucket --bucket $BUCKET_NAME # 3. Delete the DynamoDB table aws dynamodb delete-table --table-name LogData

Cost Estimate

  • S3 Standard: $0.023 per GB (First 50 TB).
  • S3 Glacier Instant Retrieval: $0.004 per GB (Approx. 80% cheaper than Standard).
  • DynamoDB TTL: FREE. DynamoDB does not charge for the deletion of items via TTL.
  • Overall Lab Cost: If using <1GB of data, this lab stays within the AWS Free Tier.

Stretch Challenge

Scenario: A healthcare client requires that logs in the legal/ prefix cannot be deleted for 7 years due to HIPAA compliance, even by an administrator.

Task: Research and implement S3 Object Lock in Compliance Mode for a specific prefix. How does this differ from standard lifecycle expiration?

Click for Solution Hint

S3 Object Lock uses a "Write Once, Read Many" (WORM) model. In Compliance mode, a protected object version cannot be overwritten or deleted by any user, including the root user, until the retention period expires.

Concept Review

FeatureBest For...Key Advantage
S3 LifecycleLarge-scale object storageCost optimization by moving data to cheaper tiers automatically.
DynamoDB TTLSession data, temporary logsAutomatic cleanup without consuming Write Capacity Units (WCU).
Object LockCompliance/Legal (HIPAA, GDPR)Ensures data immutability and prevents accidental or malicious deletion.
VersioningRecovery & AuditAllows restoration of previous states and protection against Delete calls.

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free