Hands-On Lab: Methods to Secure AI Systems on AWS
Methods to secure AI systems
Hands-On Lab: Methods to Secure AI Systems on AWS
Estimated Time: 30 minutes | Difficulty: Guided | Cloud Provider: AWS
Securing AI systems requires a defense-in-depth approach that protects infrastructure, models, and data from unauthorized access, prompt injection, and data exfiltration. In this lab, you will implement core security controls for an AI workflow on AWS, focusing on data encryption at rest, least-privilege IAM access for machine learning services, and sensitive data discovery using Amazon Macie.
Prerequisites
Before starting this lab, ensure you have the following:
- AWS Account: An active AWS account with Administrator access.
- AWS CLI: Installed and configured with your credentials (
aws configure). - Command Line Interface: Terminal (Linux/macOS) or PowerShell (Windows).
- Prior Knowledge: Basic understanding of Amazon S3, AWS IAM, and JSON syntax.
Learning Objectives
By completing this lab, you will be able to:
- Enforce Data at Rest Encryption: Provision a Customer Managed AWS KMS key to encrypt AI training datasets.
- Implement Least-Privilege Access: Create tightly scoped IAM roles specifically designed for Amazon SageMaker.
- Detect Vulnerabilities and PII: Enable Amazon Macie to scan storage repositories for sensitive data (PII/PHI) before model training.
Architecture Overview
This lab simulates a secure foundation for an AI data engineering pipeline.
Step-by-Step Instructions
Step 1: Create a Customer Managed KMS Key
To ensure your AI datasets are securely encrypted at rest (a key requirement for AI data governance), you will create an AWS Key Management Service (KMS) key.
# Create a new KMS Key and capture the KeyId
aws kms create-key --description "Key for AI Training Data" --query 'KeyMetadata.KeyId' --output textNote the output value (e.g., 1234abcd-12ab-34cd-56ef-1234567890ab), you will need it for the next step. Let's refer to it as <YOUR_KMS_KEY_ID>.
📸 Screenshot: Terminal showing the newly generated KMS Key ID.
▶Console alternative
- Navigate to KMS in the AWS Console.
- Click Create key.
- Choose Symmetric and click Next.
- Give it the alias
ai-training-keyand click Next. - Select your IAM user as the Key Administrator and click Next.
- Finish the creation and copy the Key ID.
Step 2: Create a Secure, Encrypted S3 Bucket
Next, you will create an Amazon S3 bucket to hold your AI training data. We will enforce encryption using the KMS key you just created.
# Generate a random bucket name to ensure uniqueness
export BUCKET_NAME="brainybee-ai-sec-lab-$RANDOM"
# Create the bucket
aws s3api create-bucket \
--bucket $BUCKET_NAME \
--region us-east-1
# Apply default KMS encryption to the bucket
aws s3api put-bucket-encryption \
--bucket $BUCKET_NAME \
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms","KMSMasterKeyID":"<YOUR_KMS_KEY_ID>"},"BucketKeyEnabled":true}]}'[!TIP] Replace
<YOUR_KMS_KEY_ID>with the actual ID from Step 1. Using bucket keys (BucketKeyEnabled: true) reduces KMS request costs by generating a bucket-level key that encrypts individual objects.
▶Console alternative
- Navigate to S3 > Create bucket.
- Name it
brainybee-ai-sec-lab-<YOUR_INITIALS>-<DATE>. - Under Default encryption, select Server-side encryption with AWS Key Management Service keys (SSE-KMS).
- Choose Choose from your AWS KMS keys and select the key created in Step 1.
- Click Create bucket.
Step 3: Upload a Sample Dataset with Sensitive Information
To test our vulnerability management and data privacy controls, we will upload a mock dataset containing Personally Identifiable Information (PII).
# Create a mock dataset file
echo "PatientName, Diagnosis, SSN
John Doe, Hypertension, 000-11-2222
Jane Smith, Diabetes, 111-22-3333" > raw_training_data.csv
# Upload the file to the encrypted bucket
aws s3 cp raw_training_data.csv s3://$BUCKET_NAME/raw_training_data.csvStep 4: Implement Least-Privilege IAM for Amazon SageMaker
A core tenet of AI security is ensuring that models and compute environments (like SageMaker) only have access to the data they absolutely need.
# 1. Create a trust policy file for SageMaker
echo '{
"Version": "2012-10-17",
"Statement": [ {
"Effect": "Allow",
"Principal": { "Service": "sagemaker.amazonaws.com" },
"Action": "sts:AssumeRole"
} ]
}' > trust-policy.json
# 2. Create the IAM role
aws iam create-role \
--role-name SecureAITrainingRole \
--assume-role-policy-document file://trust-policy.json
# 3. Create a restrictive policy file for the S3 bucket
echo '{
"Version": "2012-10-17",
"Statement": [ {
"Effect": "Allow",
"Action": [ "s3:GetObject", "s3:ListBucket" ],
"Resource": [ "arn:aws:s3:::'$BUCKET_NAME'", "arn:aws:s3:::'$BUCKET_NAME'/*" ]
} ]
}' > data-access-policy.json
# 4. Attach the policy to the role
aws iam put-role-policy \
--role-name SecureAITrainingRole \
--policy-name S3LeastPrivilegeAccess \
--policy-document file://data-access-policy.json▶Console alternative
- Navigate to IAM > Roles > Create role.
- Select AWS service and choose SageMaker.
- Skip attaching managed policies for now and create the role as
SecureAITrainingRole. - Click on the newly created role, choose Add permissions > Create inline policy.
- Select JSON, paste the
data-access-policy.jsoncontents (replacing the bucket name manually), and name itS3LeastPrivilegeAccess.
Step 5: Enable Amazon Macie for Data Privacy Scanning
Amazon Macie uses machine learning to automatically discover and classify sensitive data in S3. If you prepare training datasets for an AI model, Macie flags PII that needs to be removed or masked.
# Enable Amazon Macie in your current region
aws macie2 enable-macieNote: Setting up a specific Macie classification job via CLI requires a complex JSON structure. We will view the Macie setup in the console to understand the workflow.
▶Console instructions for Macie Job
- Navigate to Amazon Macie in the console.
- On the left sidebar, click S3 buckets.
- Check the box next to your
brainybee-ai-sec-lab-...bucket. - Click Create job.
- Leave default settings, click Next through the screens, name the job
AI-Dataset-Scan, and click Submit. - Within a few minutes, Macie will generate a finding that it discovered SSNs in your
raw_training_data.csv.
Checkpoints
Run these commands to verify your setup was successful:
Checkpoint 1: Verify Bucket Encryption
aws s3api get-bucket-encryption --bucket $BUCKET_NAMEExpected Result: A JSON response showing the aws:kms rule and your specific KMS Key ID.
Checkpoint 2: Verify IAM Role Restrictions
aws iam get-role-policy --role-name SecureAITrainingRole --policy-name S3LeastPrivilegeAccessExpected Result: The inline policy document showing exact GetObject and ListBucket access to your specific bucket.
Troubleshooting
| Issue / Error | Cause | Fix |
|---|---|---|
An error occurred (AccessDenied) when calling the CreateBucket operation | Your IAM user lacks permissions to create S3 buckets. | Ensure your IAM user has AmazonS3FullAccess or administrator privileges for the lab. |
An error occurred (InvalidToken) when calling the PutBucketEncryption operation | The $BUCKET_NAME variable is empty or the KMS key ID is malformed. | Ensure you exported the bucket name variable and pasted the exact KMS Key ID from Step 1. |
Macie is not enabled in the console. | Macie requires manual activation per region. | Run the aws macie2 enable-macie command or click Enable Macie in the AWS console. |
| Cannot delete S3 bucket during teardown. | The bucket contains files. S3 buckets must be empty before deletion. | Run the recursive rm command provided in the teardown steps before deleting the bucket. |
Clean-Up / Teardown
[!WARNING] Remember to run the teardown commands to avoid ongoing charges. AWS KMS keys and Amazon Macie can incur charges if left active.
Run the following commands to destroy all provisioned resources:
# 1. Empty the S3 bucket
aws s3 rm s3://$BUCKET_NAME --recursive
# 2. Delete the S3 bucket
aws s3api delete-bucket --bucket $BUCKET_NAME
# 3. Delete the IAM inline policy and role
aws iam delete-role-policy --role-name SecureAITrainingRole --policy-name S3LeastPrivilegeAccess
aws iam delete-role --role-name SecureAITrainingRole
# 4. Disable Amazon Macie
# (Note: This deletes Macie configuration in this account/region)
aws macie2 disable-macie
# 5. Schedule KMS Key deletion (Keys cannot be deleted immediately; min 7 days)
aws kms schedule-key-deletion --key-id <YOUR_KMS_KEY_ID> --pending-window-in-days 7Clean up local files:
rm raw_training_data.csv trust-policy.json data-access-policy.json