Hands-On Lab920 words

Hands-On Lab: Implementing Data Encryption and PII Masking on AWS

Data Encryption and Masking

Hands-On Lab: Implementing Data Encryption and PII Masking on AWS

This lab provides a practical walkthrough of securing data using AWS Key Management Service (KMS) for encryption at rest and AWS Glue DataBrew for PII (Personally Identifiable Information) masking. You will learn how to manage encryption keys and apply transformation recipes to protect sensitive data.

Prerequisites

  • AWS Account: Access to an AWS account with AdministratorAccess or equivalent permissions.
  • AWS CLI: Installed and configured on your local machine with aws configure.
  • Basic S3 Knowledge: Familiarity with creating buckets and uploading objects.
  • Region: We will use <YOUR_REGION> (e.g., us-east-1) throughout the lab.

Learning Objectives

  • Create and manage a Symmetric Customer Managed Key (CMK) in AWS KMS.
  • Configure Server-Side Encryption (SSE-KMS) for an Amazon S3 bucket.
  • Implement PII Masking using AWS Glue DataBrew to anonymize sensitive datasets.
  • Differentiate between Deterministic Encryption and standard masking techniques.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Create a Customer Managed Key (CMK)

We will create a symmetric key to be used for S3 encryption.

bash
aws kms create-key --description "Lab Key for S3 Encryption"

[!IMPORTANT]
Note the KeyId or Arn from the output. You will need it in Step 2.

Console Alternative
  1. Navigate to KMS in the AWS Console.
  2. Click Customer managed keys > Create key.
  3. Keep Symmetric selected and click Next.
  4. Add Alias: lab-s3-key and click Next.
  5. Define Key Administrators and Usage Permissions (select your current IAM user/role) and click Finish.

Step 2: Create an Encrypted S3 Bucket

Create a bucket and enforce encryption using the KMS key created in Step 1.

bash
# Replace <BUCKET_NAME> with a unique name like brainybee-lab-data-<ID> aws s3api create-bucket --bucket <BUCKET_NAME> --region <YOUR_REGION> # Apply SSE-KMS encryption aws s3api put-bucket-encryption \ --bucket <BUCKET_NAME> \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "<YOUR_KMS_KEY_ID>" } }] }'
Console Alternative
  1. Go to S3 > Create bucket.
  2. Enter Bucket Name: brainybee-lab-data-<ID>.
  3. Under Default encryption, select Enable.
  4. Encryption type: AWS Key Management Service key (SSE-KMS).
  5. Select Choose from your KMS keys and pick your lab-s3-key.

Step 3: Masking PII with Glue DataBrew

We will now use DataBrew to mask a sample dataset containing "Customer Names" and "Email Addresses".

  1. Upload Sample Data: Create a local file customers.csv with columns: id, name, email. Upload it to your S3 bucket.
  2. Create Dataset: Open AWS Glue DataBrew, click Datasets > Connect new dataset. Link it to your S3 file.
  3. Create Project: Click Projects > Create project. Assign it a name and select your dataset.
  4. Apply Masking Recipe:
    • Select the email column.
    • In the toolbar, click TRANSFORM > PII > Masking.
    • Choose Mask custom to replace characters with *.
  5. Deterministic Encryption:
    • Select the name column.
    • Click TRANSFORM > PII > Deterministic Encryption.
    • Provide your KMS key from Step 1.

[!TIP]
Deterministic Encryption ensures the same input always yields the same ciphertext, allowing for joins on encrypted columns without exposing the raw data.

Checkpoints

CheckpointActionExpected Result
KMS Key Stateaws kms describe-key --key-id <ID>Status should be Enabled.
S3 EncryptionUpload a test file and check properties in S3 console."Server-side encryption" should show SSE-KMS.
DataBrew JobRun the DataBrew project job.A new file appears in S3 with masked/encrypted values.

Concept Review: Envelope Encryption

When encrypting large datasets in S3, AWS uses Envelope Encryption. This process protects the data key that protects your data.

\begin{tikzpicture}[node distance=2cm, every node/.style={draw, fill=blue!10, rounded corners}] \node (kms) {AWS KMS (Root Key)}; \node (dk) [below of=kms] {Data Key (Plaintext)}; \node (edk) [right of=dk, xshift=2cm] {Encrypted Data Key}; \node (data) [below of=dk] {Raw Data}; \node (cipher) [below of=edk] {Ciphertext};

\draw[->] (kms) -- node[right, draw=none, fill=none] {Generates} (dk); \draw[->] (kms) -- (edk); \draw[->] (dk) -- node[left, draw=none, fill=none] {Encrypts} (data); \draw[->] (data) -- (cipher); \draw[->] (edk) -- (cipher); \end{tikzpicture}

Troubleshooting

ErrorPossible CauseFix
AccessDeniedIAM role lacks kms:GenerateDataKey or kms:Decrypt.Update the IAM Policy or KMS Key Policy to allow access.
KMS InvalidStateThe key is pending deletion or disabled.Ensure the key state is Enabled in the KMS console.
DataBrew Job FailedS3 bucket permissions.Ensure the DataBrew IAM role has s3:GetObject and s3:PutObject access.

Cost Estimate

  • AWS KMS: $1.00 per month per Customer Managed Key (prorated) + $0.03 per 10,000 API calls.
  • Amazon S3: Free tier eligible (first 5GB); standard rates apply otherwise ($0.023/GB).
  • Glue DataBrew: $0.48 per interactive session (30-minute increments).

[!WARNING]
KMS keys have a monthly fee. Delete them immediately after the lab to avoid charges.

Challenge

Task: Implement Credential Masking. Use AWS Secrets Manager to store a database password. Then, write a simple Lambda function (conceptually) that retrieves this secret using the SDK. How does this approach differ from hardcoding credentials in your DataBrew recipes?

Teardown

To avoid ongoing charges, execute these commands:

  1. Empty and Delete S3 Bucket:
    bash
    aws s3 rm s3://<BUCKET_NAME> --recursive aws s3api delete-bucket --bucket <BUCKET_NAME>
  2. Schedule KMS Key Deletion (Minimum 7-day waiting period):
    bash
    aws kms schedule-key-deletion --key-id <YOUR_KMS_KEY_ID> --pending-window-in-days 7
  3. Delete DataBrew Project: Manual deletion in the console is required for DataBrew projects and recipes.

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free