Hands-On Lab: Implementing Security, Governance, and Privacy for AI Workloads — AWS Certified AI Practitioner (AIF-C01) Study Notes | BrainyBee

Prerequisites

Before you begin this lab, ensure you have the following ready:

Cloud Account: An active AWS Account with Administrator access.
CLI Tools: AWS CLI (aws) installed and configured with your credentials.
Prior Knowledge: Basic understanding of AWS S3, IAM, and command-line navigation.
Region: We will use us-east-1 for consistency across services.

[!WARNING] Cost Estimate: This lab utilizes AWS Key Management Service (KMS), Amazon S3, and Amazon Macie. These services are generally covered under the AWS Free Tier for new accounts, or will cost less than $1.50 to run for an hour. Remember to run the teardown commands at the end of the lab to avoid ongoing charges.

Learning Objectives

Upon completing this 30-minute guided lab, you will be able to:

Encrypt AI Training Data: Provision a Customer Managed Key (CMK) via AWS KMS to encrypt data at rest.
Establish Secure Infrastructure: Create an isolated, encrypted S3 bucket for storing sensitive AI training sets.
Detect Sensitive Data: Utilize Amazon Macie to scan for and identify Personally Identifiable Information (PII) before it is used for AI model training.
Enforce Least Privilege: Construct an IAM Role with fine-grained permissions specifically tailored for AI/ML workloads (like Amazon SageMaker).

Architecture Overview

This architecture ensures that AI training data is securely encrypted, tightly access-controlled, and audited for sensitive privacy information.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create a Customer Managed KMS Key

To ensure your AI data is protected against unauthorized access, we will create a dedicated AWS KMS key for encryption at rest.

bash

# Create the KMS Key and store the Key ID as a variable
KMS_KEY_ID=$(aws kms create-key --description "Encryption Key for AI Training Data" --query 'KeyMetadata.KeyId' --output text)

# Create an alias so the key is easy to identify
aws kms create-alias --alias-name alias/ai-lab-key --target-key-id $KMS_KEY_ID

echo "Your KMS Key ID is: $KMS_KEY_ID"

[!TIP] Always use Customer Managed Keys (CMKs) when you need fine-grained control over who can decrypt your AI datasets.

▶Console alternative

Navigate to KMS > Customer managed keys > Create key.
Choose Symmetric and click Next.
Set the Alias to ai-lab-key and click Next.
Assign your current user as the Key Administrator.
Click Finish to create the key.

📸 Screenshot: The KMS Console showing the newly created ai-lab-key with status "Enabled".

Step 2: Create a Secure S3 Bucket for AI Training Data

We will create an S3 bucket to store our dataset and enforce our new KMS key as the default encryption mechanism.

bash

# Define a globally unique bucket name
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
BUCKET_NAME="brainybee-ai-data-$ACCOUNT_ID"

# Create the S3 Bucket
aws s3api create-bucket --bucket $BUCKET_NAME --region us-east-1

# Apply default KMS encryption to the bucket
aws s3api put-bucket-encryption \
    --bucket $BUCKET_NAME \
    --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "'"$KMS_KEY_ID"'"}, "BucketKeyEnabled": true}]}'

▶Console alternative

Navigate to S3 > Create bucket.
Name the bucket brainybee-ai-data-<YOUR_ACCOUNT_ID>.
Scroll down to Default encryption.
Select Server-side encryption with AWS Key Management Service keys (SSE-KMS).
Choose the ai-lab-key from the dropdown list and click Create bucket.

📸 Screenshot: S3 bucket properties tab highlighting the SSE-KMS encryption configuration.

Step 3: Upload Sample "Sensitive" Training Data

AI models trained on raw user data can inadvertently memorize and leak PII (Prompt Injection / Data Exfiltration risks). Let's simulate a dataset containing sensitive PII.

bash

# Create a sample CSV file with dummy PII
echo "Name,Email,SSN,CreditScore
John Doe,john@example.com,000-11-2222,750
Jane Smith,jane.smith@email.com,999-88-7777,810" > raw_training_data.csv

# Upload to the encrypted S3 bucket
aws s3 cp raw_training_data.csv s3://$BUCKET_NAME/

▶Console alternative

Create a local text file named raw_training_data.csv with the content above.
Navigate to your new bucket in the S3 console.
Click Upload, select the file, and click Upload.

Step 4: Enable Amazon Macie to Scan for PII

Before allowing an AI model to ingest this dataset, we will use Amazon Macie to automatically discover and classify the sensitive data.

bash

# Enable Amazon Macie in the current account/region
aws macie2 enable-macie

# Create a classification job to scan the AI bucket
aws macie2 create-classification-job \
    --job-type ONE_TIME \
    --name "AI-Dataset-PII-Scan" \
    --s3-job-definition '{"bucketDefinitions":[{"accountId":"'"$ACCOUNT_ID"'","buckets":["'$BUCKET_NAME'"]}]}'

[!NOTE] Macie utilizes built-in machine learning models and pattern matching to identify financial records, protected health information (PHI), and PII.

▶Console alternative

Navigate to Amazon Macie in the console and click Get started, then Enable Macie.
In the left menu, click Jobs > Create job.
Select your brainybee-ai-data-<YOUR_ACCOUNT_ID> bucket and click Next.
Select One-time job.
Name the job AI-Dataset-PII-Scan and click Submit.

📸 Screenshot: The Macie Job configuration screen showing the target S3 bucket.

Step 5: Create a Least-Privilege IAM Role for AI Workloads

If we determine the data is safe, the AI service (e.g., Amazon SageMaker) needs permission to read it. We will create a role following the Principle of Least Privilege.

bash

# 1. Create the Trust Policy allowing SageMaker to assume the role
cat <<EoF > trust-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "sagemaker.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }
  ]
}
EoF

# 2. Create the IAM Role
aws iam create-role --role-name AILab-SageMaker-ExecutionRole --assume-role-policy-document file://trust-policy.json

# 3. Create a restrictive read-only policy for our specific bucket and KMS key
cat <<EoF > ai-access-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": ["arn:aws:s3:::$BUCKET_NAME", "arn:aws:s3:::$BUCKET_NAME/*"]
        },
        {
            "Effect": "Allow",
            "Action": ["kms:Decrypt"],
            "Resource": "arn:aws:kms:us-east-1:$ACCOUNT_ID:key/$KMS_KEY_ID"
        }
    ]
}
EoF

# 4. Attach the inline policy to the role
aws iam put-role-policy --role-name AILab-SageMaker-ExecutionRole --policy-name AI-Data-Access --policy-document file://ai-access-policy.json

▶Console alternative

Navigate to IAM > Roles > Create role.
Select AWS service and choose SageMaker.
Skip attaching managed policies for now and complete role creation, naming it AILab-SageMaker-ExecutionRole.
Open the role, click Add permissions > Create inline policy.
Use the visual editor to allow s3:GetObject and s3:ListBucket specifically for your bucket ARN, and kms:Decrypt for your KMS Key ARN.

Checkpoints

Verify your progress by running the following checks:

Checkpoint 1: Verify Encryption Check that your bucket requires KMS encryption.

bash

aws s3api get-bucket-encryption --bucket $BUCKET_NAME

Expected output: A JSON object showing "SSEAlgorithm": "aws:kms".

Checkpoint 2: View Macie Findings Wait 5-10 minutes for the Macie job to complete. Check Macie for discovered sensitive data.

bash

aws macie2 list-findings

Alternatively, open the Macie Console > Findings to view a visual report. You should see findings for USA_SOCIAL_SECURITY_NUMBER.

Clean-Up / Teardown

[!WARNING] Failure to clean up can result in small but continuous AWS charges. Execute all commands below to tear down the lab environment.

bash

# 1. Disable Macie (This stops all scanning and associated charges)
aws macie2 disable-macie

# 2. Delete S3 Bucket and its contents
aws s3 rm s3://$BUCKET_NAME/raw_training_data.csv
aws s3api delete-bucket --bucket $BUCKET_NAME

# 3. Schedule KMS Key for deletion (7-day waiting period)
aws kms schedule-key-deletion --key-id $KMS_KEY_ID --pending-window-in-days 7
aws kms delete-alias --alias-name alias/ai-lab-key

# 4. Delete the IAM Role and Policies
aws iam delete-role-policy --role-name AILab-SageMaker-ExecutionRole --policy-name AI-Data-Access
aws iam delete-role --role-name AILab-SageMaker-ExecutionRole

# Clean up local files
rm trust-policy.json ai-access-policy.json raw_training_data.csv

Troubleshooting

Issue / Error	Cause	Fix
`AccessDenied` when putting S3 encryption	Your IAM user lacks KMS permissions.	Ensure your IAM user has `kms:CreateKey` and `s3:PutEncryptionConfiguration` permissions.
Macie job fails to start	Macie is not enabled in your region.	Run `aws macie2 enable-macie` or enable it via the console first.
`MalformedPolicyDocument`	The JSON variables didn't evaluate correctly.	Ensure you copy the `cat <<EoF` blocks exactly as written, or manually insert your Account ID into the JSON.

Stretch Challenge

Challenge: You have secured the data at rest, but what about data in transit?

Research AWS PrivateLink (specifically VPC Endpoints). Without step-by-step guidance, configure an Interface VPC Endpoint for Amazon S3 in your default VPC. Attach an endpoint policy that only allows traffic to your brainybee-ai-data-<YOUR_ACCOUNT_ID> bucket. This ensures that AI training instances accessing S3 never traverse the public internet, satisfying the "infrastructure protection" domain of AI security.

▶Show solution

Open the Amazon VPC Console.
Navigate to Endpoints > Create endpoint.
Select AWS services and search for com.amazonaws.us-east-1.s3 (Interface type).
Select your default VPC and subnets.
Under Policy, choose Custom and enter a policy restricting the Resource to arn:aws:s3:::brainybee-ai-data-<YOUR_ACCOUNT_ID>/*.
Click Create endpoint.

Concept Review

This lab explored several critical components necessary for building a compliant and secure AI environment.

Service / Concept	Role in AI Security	Alternative Solutions
AWS KMS	Encrypts model weights and training datasets to prevent unauthorized data exfiltration.	CloudHSM, HashiCorp Vault
Amazon Macie	Scans S3 buckets for PII to prevent privacy violations and model memorization.	Custom Regex Scripts, Data Loss Prevention (DLP) tools
IAM Roles	Applies least-privilege access, ensuring a compromised AI model cannot access unrelated cloud resources.	Identity Center, External Identity Providers (Okta)
Prompt Injection Defense	Validates incoming prompts (Not covered in lab, handled via App-level guardrails).	Amazon Bedrock Guardrails, LangChain Output Parsers

Prerequisites

Before you begin this lab, ensure you have the following ready:

Cloud Account: An active AWS Account with Administrator access.
CLI Tools: AWS CLI (aws) installed and configured with your credentials.
Prior Knowledge: Basic understanding of AWS S3, IAM, and command-line navigation.
Region: We will use us-east-1 for consistency across services.

[!WARNING] Cost Estimate: This lab utilizes AWS Key Management Service (KMS), Amazon S3, and Amazon Macie. These services are generally covered under the AWS Free Tier for new accounts, or will cost less than $1.50 to run for an hour. Remember to run the teardown commands at the end of the lab to avoid ongoing charges.

Learning Objectives

Upon completing this 30-minute guided lab, you will be able to:

Encrypt AI Training Data: Provision a Customer Managed Key (CMK) via AWS KMS to encrypt data at rest.
Establish Secure Infrastructure: Create an isolated, encrypted S3 bucket for storing sensitive AI training sets.
Detect Sensitive Data: Utilize Amazon Macie to scan for and identify Personally Identifiable Information (PII) before it is used for AI model training.
Enforce Least Privilege: Construct an IAM Role with fine-grained permissions specifically tailored for AI/ML workloads (like Amazon SageMaker).

Architecture Overview

This architecture ensures that AI training data is securely encrypted, tightly access-controlled, and audited for sensitive privacy information.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create a Customer Managed KMS Key

To ensure your AI data is protected against unauthorized access, we will create a dedicated AWS KMS key for encryption at rest.

bash

# Create the KMS Key and store the Key ID as a variable
KMS_KEY_ID=$(aws kms create-key --description "Encryption Key for AI Training Data" --query 'KeyMetadata.KeyId' --output text)

# Create an alias so the key is easy to identify
aws kms create-alias --alias-name alias/ai-lab-key --target-key-id $KMS_KEY_ID

echo "Your KMS Key ID is: $KMS_KEY_ID"

[!TIP] Always use Customer Managed Keys (CMKs) when you need fine-grained control over who can decrypt your AI datasets.

▶Console alternative

Navigate to KMS > Customer managed keys > Create key.
Choose Symmetric and click Next.
Set the Alias to ai-lab-key and click Next.
Assign your current user as the Key Administrator.
Click Finish to create the key.

📸 Screenshot: The KMS Console showing the newly created ai-lab-key with status "Enabled".

Step 2: Create a Secure S3 Bucket for AI Training Data

We will create an S3 bucket to store our dataset and enforce our new KMS key as the default encryption mechanism.

bash

# Define a globally unique bucket name
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
BUCKET_NAME="brainybee-ai-data-$ACCOUNT_ID"

# Create the S3 Bucket
aws s3api create-bucket --bucket $BUCKET_NAME --region us-east-1

# Apply default KMS encryption to the bucket
aws s3api put-bucket-encryption \
    --bucket $BUCKET_NAME \
    --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "'"$KMS_KEY_ID"'"}, "BucketKeyEnabled": true}]}'

▶Console alternative

Navigate to S3 > Create bucket.
Name the bucket brainybee-ai-data-<YOUR_ACCOUNT_ID>.
Scroll down to Default encryption.
Select Server-side encryption with AWS Key Management Service keys (SSE-KMS).
Choose the ai-lab-key from the dropdown list and click Create bucket.

📸 Screenshot: S3 bucket properties tab highlighting the SSE-KMS encryption configuration.

Step 3: Upload Sample "Sensitive" Training Data

AI models trained on raw user data can inadvertently memorize and leak PII (Prompt Injection / Data Exfiltration risks). Let's simulate a dataset containing sensitive PII.

bash

# Create a sample CSV file with dummy PII
echo "Name,Email,SSN,CreditScore
John Doe,john@example.com,000-11-2222,750
Jane Smith,jane.smith@email.com,999-88-7777,810" > raw_training_data.csv

# Upload to the encrypted S3 bucket
aws s3 cp raw_training_data.csv s3://$BUCKET_NAME/

▶Console alternative

Create a local text file named raw_training_data.csv with the content above.
Navigate to your new bucket in the S3 console.
Click Upload, select the file, and click Upload.

Step 4: Enable Amazon Macie to Scan for PII

Before allowing an AI model to ingest this dataset, we will use Amazon Macie to automatically discover and classify the sensitive data.

bash

# Enable Amazon Macie in the current account/region
aws macie2 enable-macie

# Create a classification job to scan the AI bucket
aws macie2 create-classification-job \
    --job-type ONE_TIME \
    --name "AI-Dataset-PII-Scan" \
    --s3-job-definition '{"bucketDefinitions":[{"accountId":"'"$ACCOUNT_ID"'","buckets":["'$BUCKET_NAME'"]}]}'

[!NOTE] Macie utilizes built-in machine learning models and pattern matching to identify financial records, protected health information (PHI), and PII.

▶Console alternative

Navigate to Amazon Macie in the console and click Get started, then Enable Macie.
In the left menu, click Jobs > Create job.
Select your brainybee-ai-data-<YOUR_ACCOUNT_ID> bucket and click Next.
Select One-time job.
Name the job AI-Dataset-PII-Scan and click Submit.

📸 Screenshot: The Macie Job configuration screen showing the target S3 bucket.

Step 5: Create a Least-Privilege IAM Role for AI Workloads

If we determine the data is safe, the AI service (e.g., Amazon SageMaker) needs permission to read it. We will create a role following the Principle of Least Privilege.

bash

# 1. Create the Trust Policy allowing SageMaker to assume the role
cat <<EoF > trust-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "sagemaker.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }
  ]
}
EoF

# 2. Create the IAM Role
aws iam create-role --role-name AILab-SageMaker-ExecutionRole --assume-role-policy-document file://trust-policy.json

# 3. Create a restrictive read-only policy for our specific bucket and KMS key
cat <<EoF > ai-access-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": ["arn:aws:s3:::$BUCKET_NAME", "arn:aws:s3:::$BUCKET_NAME/*"]
        },
        {
            "Effect": "Allow",
            "Action": ["kms:Decrypt"],
            "Resource": "arn:aws:kms:us-east-1:$ACCOUNT_ID:key/$KMS_KEY_ID"
        }
    ]
}
EoF

# 4. Attach the inline policy to the role
aws iam put-role-policy --role-name AILab-SageMaker-ExecutionRole --policy-name AI-Data-Access --policy-document file://ai-access-policy.json

▶Console alternative

Navigate to IAM > Roles > Create role.
Select AWS service and choose SageMaker.
Skip attaching managed policies for now and complete role creation, naming it AILab-SageMaker-ExecutionRole.
Open the role, click Add permissions > Create inline policy.
Use the visual editor to allow s3:GetObject and s3:ListBucket specifically for your bucket ARN, and kms:Decrypt for your KMS Key ARN.

Checkpoints

Verify your progress by running the following checks:

Checkpoint 1: Verify Encryption Check that your bucket requires KMS encryption.

bash

aws s3api get-bucket-encryption --bucket $BUCKET_NAME

Expected output: A JSON object showing "SSEAlgorithm": "aws:kms".

Checkpoint 2: View Macie Findings Wait 5-10 minutes for the Macie job to complete. Check Macie for discovered sensitive data.

bash

aws macie2 list-findings

Alternatively, open the Macie Console > Findings to view a visual report. You should see findings for USA_SOCIAL_SECURITY_NUMBER.

Clean-Up / Teardown

[!WARNING] Failure to clean up can result in small but continuous AWS charges. Execute all commands below to tear down the lab environment.

bash

# 1. Disable Macie (This stops all scanning and associated charges)
aws macie2 disable-macie

# 2. Delete S3 Bucket and its contents
aws s3 rm s3://$BUCKET_NAME/raw_training_data.csv
aws s3api delete-bucket --bucket $BUCKET_NAME

# 3. Schedule KMS Key for deletion (7-day waiting period)
aws kms schedule-key-deletion --key-id $KMS_KEY_ID --pending-window-in-days 7
aws kms delete-alias --alias-name alias/ai-lab-key

# 4. Delete the IAM Role and Policies
aws iam delete-role-policy --role-name AILab-SageMaker-ExecutionRole --policy-name AI-Data-Access
aws iam delete-role --role-name AILab-SageMaker-ExecutionRole

# Clean up local files
rm trust-policy.json ai-access-policy.json raw_training_data.csv

Troubleshooting

Issue / Error	Cause	Fix
`AccessDenied` when putting S3 encryption	Your IAM user lacks KMS permissions.	Ensure your IAM user has `kms:CreateKey` and `s3:PutEncryptionConfiguration` permissions.
Macie job fails to start	Macie is not enabled in your region.	Run `aws macie2 enable-macie` or enable it via the console first.
`MalformedPolicyDocument`	The JSON variables didn't evaluate correctly.	Ensure you copy the `cat <<EoF` blocks exactly as written, or manually insert your Account ID into the JSON.

Stretch Challenge

Challenge: You have secured the data at rest, but what about data in transit?

▶Show solution

Open the Amazon VPC Console.
Navigate to Endpoints > Create endpoint.
Select AWS services and search for com.amazonaws.us-east-1.s3 (Interface type).
Select your default VPC and subnets.
Under Policy, choose Custom and enter a policy restricting the Resource to arn:aws:s3:::brainybee-ai-data-<YOUR_ACCOUNT_ID>/*.
Click Create endpoint.

Concept Review

This lab explored several critical components necessary for building a compliant and secure AI environment.

Service / Concept	Role in AI Security	Alternative Solutions
AWS KMS	Encrypts model weights and training datasets to prevent unauthorized data exfiltration.	CloudHSM, HashiCorp Vault
Amazon Macie	Scans S3 buckets for PII to prevent privacy violations and model memorization.	Custom Regex Scripts, Data Loss Prevention (DLP) tools
IAM Roles	Applies least-privilege access, ensuring a compromised AI model cannot access unrelated cloud resources.	Identity Center, External Identity Providers (Okta)
Prompt Injection Defense	Validates incoming prompts (Not covered in lab, handled via App-level guardrails).	Amazon Bedrock Guardrails, LangChain Output Parsers