Hands-On Lab1,050 words

Lab: Implementing Data Privacy and Governance on AWS

Data Privacy and Governance

Lab: Implementing Data Privacy and Governance on AWS

This hands-on lab guides you through implementing data privacy controls, PII identification, and fine-grained access control using AWS Lake Formation, Amazon Glue, and Amazon S3. These skills are critical for the AWS Certified Data Engineer - Associate (DEA-C01) exam.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges. Estimated cost is < $0.50 if within the Free Tier.

Prerequisites

  • An active AWS Account.
  • AWS CLI installed and configured with Administrator access.
  • Basic knowledge of IAM (Identity and Access Management).
  • A text editor to create a sample CSV file.
  • Placeholder: Replace <YOUR_ACCOUNT_ID> and <YOUR_REGION> with your actual details.

Learning Objectives

  • Create a secure data lake storage structure in Amazon S3.
  • Catalog datasets using AWS Glue Crawlers.
  • Implement Fine-Grained Access Control (FGAC) (column-level) using AWS Lake Formation.
  • Understand the workflow for PII Identification and data masking.

Architecture Overview

Loading Diagram...

Step-by-Step Instructions

Step 1: Prepare Sample Data and S3 Bucket

First, we create a bucket and a mock dataset containing PII (Personally Identifiable Information).

  1. Create a file named users.csv with the following content:

    csv
    user_id,name,email,phone,credit_card,zipcode 1,John Doe,john@example.com,555-0101,1234-5678-9012,90210 2,Jane Smith,jane@example.com,555-0102,9876-5432-1098,10001
  2. Run the following CLI commands:

bash
# Create a unique bucket name BUCKET_NAME="brainybee-lab-privacy-<YOUR_ACCOUNT_ID>" # Create the bucket aws s3 mb s3://$BUCKET_NAME --region <YOUR_REGION> # Upload the data to a 'raw' prefix aws s3 cp users.csv s3://$BUCKET_NAME/raw/users.csv
Console alternative
  1. Navigate to S3 > Create bucket.
  2. Name it brainybee-lab-privacy-<YOUR_ACCOUNT_ID>.
  3. Click Create.
  4. Upload the users.csv file into a new folder named raw/.

Step 2: Set Up Lake Formation Permissions

Before Lake Formation can manage the data, the IAM role/user running this lab must be a Data Lake Administrator.

bash
# Grant your current user administrative rights in Lake Formation aws lakeformation put-data-lake-settings --data-lake-settings '{ "DataLakeAdmins": [{"DataLakePrincipalIdentifier": "arn:aws:iam::<YOUR_ACCOUNT_ID>:root"}] }'

Step 3: Register S3 Location & Catalog Data

We will now register the S3 bucket as a managed location and run a crawler to populate the metadata.

bash
# Register the S3 location aws lakeformation register-resource --resource-arn arn:aws:s3:::$BUCKET_NAME # Create a Glue Database aws glue create-database --database-input '{"Name": "lab_privacy_db"}' # Create and run a Crawler (Simplified IAM role assumption assumed) aws glue create-crawler --name privacy-crawler --role AWSGlueServiceRole-Lab --database-name lab_privacy_db --targets '{"S3Targets": [{"Path": "s3://'$BUCKET_NAME'/raw/"}]}' aws glue start-crawler --name privacy-crawler

[!TIP] It takes about 1-2 minutes for the crawler to finish. You can check status with aws glue get-crawler --name privacy-crawler.

Step 4: Implement Column-Level Security

In this step, we restrict an IAM user from seeing the credit_card column.

bash
# Create a Data Cell Filter (This is the mechanism for FGAC) aws lakeformation create-data-cells-filter --table-data-cells-filter '{ "TableName": "raw", "DatabaseName": "lab_privacy_db", "Name": "mask_pii_filter", "ColumnNames": ["user_id", "name", "email", "zipcode"], "ColumnWildcard": {"ExcludedColumnNames": ["credit_card", "phone"]} }'

Checkpoints

  1. Glue Catalog Check: Run aws glue get-table --database-name lab_privacy_db --name raw. You should see the schema including credit_card.
  2. Lake Formation Check: Open the Lake Formation Console > Data filters. You should see mask_pii_filter listed.
  3. Permission Verification: Ensure your principal does not have SELECT on the full table but only through the filter.

Troubleshooting

ErrorPossible CauseFix
Insufficient Lake Formation PermissionsYou aren't a Data Lake Admin.Repeat Step 2 or add your IAM ARN to Admin list in LF Console.
Crawler failed: Access DeniedS3 bucket policy or Glue Role issues.Ensure the IAM Role used by the crawler has s3:GetObject on the bucket.
Table not foundCrawler hasn't finished.Wait for the crawler status to return to READY.

Clean-Up / Teardown

Execute these commands to remove all resources and stop charges:

bash
# 1. Delete Glue Crawler and Database aws glue delete-crawler --name privacy-crawler aws glue delete-database --name lab_privacy_db # 2. Deregister S3 location aws lakeformation deregister-resource --resource-arn arn:aws:s3:::$BUCKET_NAME # 3. Delete S3 Bucket and objects aws s3 rb s3://$BUCKET_NAME --force

Cost Estimate

ServiceEstimated Cost
S3Free Tier (0.023/GB if exceeded)
AWS Glue~$0.44 per DPU-Hour (Crawler typically uses 2 DPUs for < 2 mins)
Lake Formation$0.00 (Governance is free)
Total~$0.10 - $0.25

Stretch Challenge

Row-Level Filtering: Modify the Data Cell Filter in Step 4 to only show users where zipcode = '90210'. This demonstrates how to handle data residency requirements (e.g., ensuring local analysts only see local data).

Concept Review

Data Privacy Pillars on AWS

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds
  • Data Masking: Replacing sensitive data with functional aliases. Lake Formation handles this via cell-level filters.
  • PII Identification: The process of finding sensitive strings (SSNs, Emails). Amazon Macie automates this using machine learning.
  • Principle of Least Privilege: Granting users only the specific columns and rows they need to perform their jobs.
FeatureIAM PolicyLake Formation
GranularityResource/API levelRow, Column, Cell level
Ease of UseComplex JSON policiesVisual Grant/Revoke
Cross-AccountManual Role AssumptionAutomated Data Sharing

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free