Hands-On Lab: Implementing Security, Governance, and Privacy for AI Workloads
Methods to secure AI systems
Prerequisites
Before you begin this lab, ensure you have the following ready:
- Cloud Account: An active AWS Account with Administrator access.
- CLI Tools: AWS CLI (
aws) installed and configured with your credentials. - Prior Knowledge: Basic understanding of AWS S3, IAM, and command-line navigation.
- Region: We will use
us-east-1for consistency across services.
[!WARNING] Cost Estimate: This lab utilizes AWS Key Management Service (KMS), Amazon S3, and Amazon Macie. These services are generally covered under the AWS Free Tier for new accounts, or will cost less than $1.50 to run for an hour. Remember to run the teardown commands at the end of the lab to avoid ongoing charges.
Learning Objectives
Upon completing this 30-minute guided lab, you will be able to:
- Encrypt AI Training Data: Provision a Customer Managed Key (CMK) via AWS KMS to encrypt data at rest.
- Establish Secure Infrastructure: Create an isolated, encrypted S3 bucket for storing sensitive AI training sets.
- Detect Sensitive Data: Utilize Amazon Macie to scan for and identify Personally Identifiable Information (PII) before it is used for AI model training.
- Enforce Least Privilege: Construct an IAM Role with fine-grained permissions specifically tailored for AI/ML workloads (like Amazon SageMaker).
Architecture Overview
This architecture ensures that AI training data is securely encrypted, tightly access-controlled, and audited for sensitive privacy information.
Step-by-Step Instructions
Step 1: Create a Customer Managed KMS Key
To ensure your AI data is protected against unauthorized access, we will create a dedicated AWS KMS key for encryption at rest.
# Create the KMS Key and store the Key ID as a variable
KMS_KEY_ID=$(aws kms create-key --description "Encryption Key for AI Training Data" --query 'KeyMetadata.KeyId' --output text)
# Create an alias so the key is easy to identify
aws kms create-alias --alias-name alias/ai-lab-key --target-key-id $KMS_KEY_ID
echo "Your KMS Key ID is: $KMS_KEY_ID"[!TIP] Always use Customer Managed Keys (CMKs) when you need fine-grained control over who can decrypt your AI datasets.
▶Console alternative
- Navigate to KMS > Customer managed keys > Create key.
- Choose Symmetric and click Next.
- Set the Alias to
ai-lab-keyand click Next. - Assign your current user as the Key Administrator.
- Click Finish to create the key.
📸 Screenshot: The KMS Console showing the newly created
ai-lab-keywith status "Enabled".
Step 2: Create a Secure S3 Bucket for AI Training Data
We will create an S3 bucket to store our dataset and enforce our new KMS key as the default encryption mechanism.
# Define a globally unique bucket name
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
BUCKET_NAME="brainybee-ai-data-$ACCOUNT_ID"
# Create the S3 Bucket
aws s3api create-bucket --bucket $BUCKET_NAME --region us-east-1
# Apply default KMS encryption to the bucket
aws s3api put-bucket-encryption \
--bucket $BUCKET_NAME \
--server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "'"$KMS_KEY_ID"'"}, "BucketKeyEnabled": true}]}'▶Console alternative
- Navigate to S3 > Create bucket.
- Name the bucket
brainybee-ai-data-<YOUR_ACCOUNT_ID>. - Scroll down to Default encryption.
- Select Server-side encryption with AWS Key Management Service keys (SSE-KMS).
- Choose the
ai-lab-keyfrom the dropdown list and click Create bucket.
📸 Screenshot: S3 bucket properties tab highlighting the SSE-KMS encryption configuration.
Step 3: Upload Sample "Sensitive" Training Data
AI models trained on raw user data can inadvertently memorize and leak PII (Prompt Injection / Data Exfiltration risks). Let's simulate a dataset containing sensitive PII.
# Create a sample CSV file with dummy PII
echo "Name,Email,SSN,CreditScore
John Doe,john@example.com,000-11-2222,750
Jane Smith,jane.smith@email.com,999-88-7777,810" > raw_training_data.csv
# Upload to the encrypted S3 bucket
aws s3 cp raw_training_data.csv s3://$BUCKET_NAME/▶Console alternative
- Create a local text file named
raw_training_data.csvwith the content above. - Navigate to your new bucket in the S3 console.
- Click Upload, select the file, and click Upload.
Step 4: Enable Amazon Macie to Scan for PII
Before allowing an AI model to ingest this dataset, we will use Amazon Macie to automatically discover and classify the sensitive data.
# Enable Amazon Macie in the current account/region
aws macie2 enable-macie
# Create a classification job to scan the AI bucket
aws macie2 create-classification-job \
--job-type ONE_TIME \
--name "AI-Dataset-PII-Scan" \
--s3-job-definition '{"bucketDefinitions":[{"accountId":"'"$ACCOUNT_ID"'","buckets":["'$BUCKET_NAME'"]}]}'[!NOTE] Macie utilizes built-in machine learning models and pattern matching to identify financial records, protected health information (PHI), and PII.
▶Console alternative
- Navigate to Amazon Macie in the console and click Get started, then Enable Macie.
- In the left menu, click Jobs > Create job.
- Select your
brainybee-ai-data-<YOUR_ACCOUNT_ID>bucket and click Next. - Select One-time job.
- Name the job
AI-Dataset-PII-Scanand click Submit.
📸 Screenshot: The Macie Job configuration screen showing the target S3 bucket.
Step 5: Create a Least-Privilege IAM Role for AI Workloads
If we determine the data is safe, the AI service (e.g., Amazon SageMaker) needs permission to read it. We will create a role following the Principle of Least Privilege.
# 1. Create the Trust Policy allowing SageMaker to assume the role
cat <<EoF > trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "sagemaker.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}
EoF
# 2. Create the IAM Role
aws iam create-role --role-name AILab-SageMaker-ExecutionRole --assume-role-policy-document file://trust-policy.json
# 3. Create a restrictive read-only policy for our specific bucket and KMS key
cat <<EoF > ai-access-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::$BUCKET_NAME", "arn:aws:s3:::$BUCKET_NAME/*"]
},
{
"Effect": "Allow",
"Action": ["kms:Decrypt"],
"Resource": "arn:aws:kms:us-east-1:$ACCOUNT_ID:key/$KMS_KEY_ID"
}
]
}
EoF
# 4. Attach the inline policy to the role
aws iam put-role-policy --role-name AILab-SageMaker-ExecutionRole --policy-name AI-Data-Access --policy-document file://ai-access-policy.json▶Console alternative
- Navigate to IAM > Roles > Create role.
- Select AWS service and choose SageMaker.
- Skip attaching managed policies for now and complete role creation, naming it
AILab-SageMaker-ExecutionRole. - Open the role, click Add permissions > Create inline policy.
- Use the visual editor to allow
s3:GetObjectands3:ListBucketspecifically for your bucket ARN, andkms:Decryptfor your KMS Key ARN.
Checkpoints
Verify your progress by running the following checks:
Checkpoint 1: Verify Encryption Check that your bucket requires KMS encryption.
aws s3api get-bucket-encryption --bucket $BUCKET_NAMEExpected output: A JSON object showing "SSEAlgorithm": "aws:kms".
Checkpoint 2: View Macie Findings Wait 5-10 minutes for the Macie job to complete. Check Macie for discovered sensitive data.
aws macie2 list-findingsAlternatively, open the Macie Console > Findings to view a visual report. You should see findings for USA_SOCIAL_SECURITY_NUMBER.
Clean-Up / Teardown
[!WARNING] Failure to clean up can result in small but continuous AWS charges. Execute all commands below to tear down the lab environment.
# 1. Disable Macie (This stops all scanning and associated charges)
aws macie2 disable-macie
# 2. Delete S3 Bucket and its contents
aws s3 rm s3://$BUCKET_NAME/raw_training_data.csv
aws s3api delete-bucket --bucket $BUCKET_NAME
# 3. Schedule KMS Key for deletion (7-day waiting period)
aws kms schedule-key-deletion --key-id $KMS_KEY_ID --pending-window-in-days 7
aws kms delete-alias --alias-name alias/ai-lab-key
# 4. Delete the IAM Role and Policies
aws iam delete-role-policy --role-name AILab-SageMaker-ExecutionRole --policy-name AI-Data-Access
aws iam delete-role --role-name AILab-SageMaker-ExecutionRole
# Clean up local files
rm trust-policy.json ai-access-policy.json raw_training_data.csvTroubleshooting
| Issue / Error | Cause | Fix |
|---|---|---|
AccessDenied when putting S3 encryption | Your IAM user lacks KMS permissions. | Ensure your IAM user has kms:CreateKey and s3:PutEncryptionConfiguration permissions. |
| Macie job fails to start | Macie is not enabled in your region. | Run aws macie2 enable-macie or enable it via the console first. |
MalformedPolicyDocument | The JSON variables didn't evaluate correctly. | Ensure you copy the cat <<EoF blocks exactly as written, or manually insert your Account ID into the JSON. |
Stretch Challenge
Challenge: You have secured the data at rest, but what about data in transit?
Research AWS PrivateLink (specifically VPC Endpoints). Without step-by-step guidance, configure an Interface VPC Endpoint for Amazon S3 in your default VPC. Attach an endpoint policy that only allows traffic to your brainybee-ai-data-<YOUR_ACCOUNT_ID> bucket. This ensures that AI training instances accessing S3 never traverse the public internet, satisfying the "infrastructure protection" domain of AI security.
▶Show solution
- Open the Amazon VPC Console.
- Navigate to Endpoints > Create endpoint.
- Select AWS services and search for
com.amazonaws.us-east-1.s3(Interface type). - Select your default VPC and subnets.
- Under Policy, choose Custom and enter a policy restricting the
Resourcetoarn:aws:s3:::brainybee-ai-data-<YOUR_ACCOUNT_ID>/*. - Click Create endpoint.
Concept Review
This lab explored several critical components necessary for building a compliant and secure AI environment.
| Service / Concept | Role in AI Security | Alternative Solutions |
|---|---|---|
| AWS KMS | Encrypts model weights and training datasets to prevent unauthorized data exfiltration. | CloudHSM, HashiCorp Vault |
| Amazon Macie | Scans S3 buckets for PII to prevent privacy violations and model memorization. | Custom Regex Scripts, Data Loss Prevention (DLP) tools |
| IAM Roles | Applies least-privilege access, ensuring a compromised AI model cannot access unrelated cloud resources. | Identity Center, External Identity Providers (Okta) |
| Prompt Injection Defense | Validates incoming prompts (Not covered in lab, handled via App-level guardrails). | Amazon Bedrock Guardrails, LangChain Output Parsers |