Hands-On Lab: Implementing Data Encryption and PII Masking on AWS
Data Encryption and Masking
Hands-On Lab: Implementing Data Encryption and PII Masking on AWS
This lab provides a practical walkthrough of securing data using AWS Key Management Service (KMS) for encryption at rest and AWS Glue DataBrew for PII (Personally Identifiable Information) masking. You will learn how to manage encryption keys and apply transformation recipes to protect sensitive data.
Prerequisites
- AWS Account: Access to an AWS account with
AdministratorAccessor equivalent permissions. - AWS CLI: Installed and configured on your local machine with
aws configure. - Basic S3 Knowledge: Familiarity with creating buckets and uploading objects.
- Region: We will use
<YOUR_REGION>(e.g.,us-east-1) throughout the lab.
Learning Objectives
- Create and manage a Symmetric Customer Managed Key (CMK) in AWS KMS.
- Configure Server-Side Encryption (SSE-KMS) for an Amazon S3 bucket.
- Implement PII Masking using AWS Glue DataBrew to anonymize sensitive datasets.
- Differentiate between Deterministic Encryption and standard masking techniques.
Architecture Overview
Step-by-Step Instructions
Step 1: Create a Customer Managed Key (CMK)
We will create a symmetric key to be used for S3 encryption.
aws kms create-key --description "Lab Key for S3 Encryption"[!IMPORTANT]
Note theKeyIdorArnfrom the output. You will need it in Step 2.
▶Console Alternative
- Navigate to KMS in the AWS Console.
- Click Customer managed keys > Create key.
- Keep Symmetric selected and click Next.
- Add Alias:
lab-s3-keyand click Next. - Define Key Administrators and Usage Permissions (select your current IAM user/role) and click Finish.
Step 2: Create an Encrypted S3 Bucket
Create a bucket and enforce encryption using the KMS key created in Step 1.
# Replace <BUCKET_NAME> with a unique name like brainybee-lab-data-<ID>
aws s3api create-bucket --bucket <BUCKET_NAME> --region <YOUR_REGION>
# Apply SSE-KMS encryption
aws s3api put-bucket-encryption \
--bucket <BUCKET_NAME> \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "<YOUR_KMS_KEY_ID>"
}
}]
}'▶Console Alternative
- Go to S3 > Create bucket.
- Enter Bucket Name:
brainybee-lab-data-<ID>. - Under Default encryption, select Enable.
- Encryption type: AWS Key Management Service key (SSE-KMS).
- Select Choose from your KMS keys and pick your
lab-s3-key.
Step 3: Masking PII with Glue DataBrew
We will now use DataBrew to mask a sample dataset containing "Customer Names" and "Email Addresses".
- Upload Sample Data: Create a local file
customers.csvwith columns:id, name, email. Upload it to your S3 bucket. - Create Dataset: Open AWS Glue DataBrew, click Datasets > Connect new dataset. Link it to your S3 file.
- Create Project: Click Projects > Create project. Assign it a name and select your dataset.
- Apply Masking Recipe:
- Select the
emailcolumn. - In the toolbar, click TRANSFORM > PII > Masking.
- Choose Mask custom to replace characters with
*.
- Select the
- Deterministic Encryption:
- Select the
namecolumn. - Click TRANSFORM > PII > Deterministic Encryption.
- Provide your KMS key from Step 1.
- Select the
[!TIP]
Deterministic Encryption ensures the same input always yields the same ciphertext, allowing for joins on encrypted columns without exposing the raw data.
Checkpoints
| Checkpoint | Action | Expected Result |
|---|---|---|
| KMS Key State | aws kms describe-key --key-id <ID> | Status should be Enabled. |
| S3 Encryption | Upload a test file and check properties in S3 console. | "Server-side encryption" should show SSE-KMS. |
| DataBrew Job | Run the DataBrew project job. | A new file appears in S3 with masked/encrypted values. |
Concept Review: Envelope Encryption
When encrypting large datasets in S3, AWS uses Envelope Encryption. This process protects the data key that protects your data.
\begin{tikzpicture}[node distance=2cm, every node/.style={draw, fill=blue!10, rounded corners}] \node (kms) {AWS KMS (Root Key)}; \node (dk) [below of=kms] {Data Key (Plaintext)}; \node (edk) [right of=dk, xshift=2cm] {Encrypted Data Key}; \node (data) [below of=dk] {Raw Data}; \node (cipher) [below of=edk] {Ciphertext};
\draw[->] (kms) -- node[right, draw=none, fill=none] {Generates} (dk); \draw[->] (kms) -- (edk); \draw[->] (dk) -- node[left, draw=none, fill=none] {Encrypts} (data); \draw[->] (data) -- (cipher); \draw[->] (edk) -- (cipher); \end{tikzpicture}
Troubleshooting
| Error | Possible Cause | Fix |
|---|---|---|
AccessDenied | IAM role lacks kms:GenerateDataKey or kms:Decrypt. | Update the IAM Policy or KMS Key Policy to allow access. |
KMS InvalidState | The key is pending deletion or disabled. | Ensure the key state is Enabled in the KMS console. |
DataBrew Job Failed | S3 bucket permissions. | Ensure the DataBrew IAM role has s3:GetObject and s3:PutObject access. |
Cost Estimate
- AWS KMS: $1.00 per month per Customer Managed Key (prorated) + $0.03 per 10,000 API calls.
- Amazon S3: Free tier eligible (first 5GB); standard rates apply otherwise ($0.023/GB).
- Glue DataBrew: $0.48 per interactive session (30-minute increments).
[!WARNING]
KMS keys have a monthly fee. Delete them immediately after the lab to avoid charges.
Challenge
Task: Implement Credential Masking. Use AWS Secrets Manager to store a database password. Then, write a simple Lambda function (conceptually) that retrieves this secret using the SDK. How does this approach differ from hardcoding credentials in your DataBrew recipes?
Teardown
To avoid ongoing charges, execute these commands:
- Empty and Delete S3 Bucket:
bash
aws s3 rm s3://<BUCKET_NAME> --recursive aws s3api delete-bucket --bucket <BUCKET_NAME> - Schedule KMS Key Deletion (Minimum 7-day waiting period):
bash
aws kms schedule-key-deletion --key-id <YOUR_KMS_KEY_ID> --pending-window-in-days 7 - Delete DataBrew Project: Manual deletion in the console is required for DataBrew projects and recipes.