Hands-On Lab1,058 words

Hands-On Lab: Methods to Secure AI Systems on AWS

Methods to secure AI systems

Hands-On Lab: Methods to Secure AI Systems on AWS

Estimated Time: 30 minutes | Difficulty: Guided | Cloud Provider: AWS

Securing AI systems requires a defense-in-depth approach that protects infrastructure, models, and data from unauthorized access, prompt injection, and data exfiltration. In this lab, you will implement core security controls for an AI workflow on AWS, focusing on data encryption at rest, least-privilege IAM access for machine learning services, and sensitive data discovery using Amazon Macie.


Prerequisites

Before starting this lab, ensure you have the following:

  • AWS Account: An active AWS account with Administrator access.
  • AWS CLI: Installed and configured with your credentials (aws configure).
  • Command Line Interface: Terminal (Linux/macOS) or PowerShell (Windows).
  • Prior Knowledge: Basic understanding of Amazon S3, AWS IAM, and JSON syntax.

Learning Objectives

By completing this lab, you will be able to:

  1. Enforce Data at Rest Encryption: Provision a Customer Managed AWS KMS key to encrypt AI training datasets.
  2. Implement Least-Privilege Access: Create tightly scoped IAM roles specifically designed for Amazon SageMaker.
  3. Detect Vulnerabilities and PII: Enable Amazon Macie to scan storage repositories for sensitive data (PII/PHI) before model training.

Architecture Overview

This lab simulates a secure foundation for an AI data engineering pipeline.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create a Customer Managed KMS Key

To ensure your AI datasets are securely encrypted at rest (a key requirement for AI data governance), you will create an AWS Key Management Service (KMS) key.

bash
# Create a new KMS Key and capture the KeyId aws kms create-key --description "Key for AI Training Data" --query 'KeyMetadata.KeyId' --output text

Note the output value (e.g., 1234abcd-12ab-34cd-56ef-1234567890ab), you will need it for the next step. Let's refer to it as <YOUR_KMS_KEY_ID>.

📸 Screenshot: Terminal showing the newly generated KMS Key ID.

Console alternative
  1. Navigate to KMS in the AWS Console.
  2. Click Create key.
  3. Choose Symmetric and click Next.
  4. Give it the alias ai-training-key and click Next.
  5. Select your IAM user as the Key Administrator and click Next.
  6. Finish the creation and copy the Key ID.

Step 2: Create a Secure, Encrypted S3 Bucket

Next, you will create an Amazon S3 bucket to hold your AI training data. We will enforce encryption using the KMS key you just created.

bash
# Generate a random bucket name to ensure uniqueness export BUCKET_NAME="brainybee-ai-sec-lab-$RANDOM" # Create the bucket aws s3api create-bucket \ --bucket $BUCKET_NAME \ --region us-east-1 # Apply default KMS encryption to the bucket aws s3api put-bucket-encryption \ --bucket $BUCKET_NAME \ --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms","KMSMasterKeyID":"<YOUR_KMS_KEY_ID>"},"BucketKeyEnabled":true}]}'

[!TIP] Replace <YOUR_KMS_KEY_ID> with the actual ID from Step 1. Using bucket keys (BucketKeyEnabled: true) reduces KMS request costs by generating a bucket-level key that encrypts individual objects.

Console alternative
  1. Navigate to S3 > Create bucket.
  2. Name it brainybee-ai-sec-lab-<YOUR_INITIALS>-<DATE>.
  3. Under Default encryption, select Server-side encryption with AWS Key Management Service keys (SSE-KMS).
  4. Choose Choose from your AWS KMS keys and select the key created in Step 1.
  5. Click Create bucket.

Step 3: Upload a Sample Dataset with Sensitive Information

To test our vulnerability management and data privacy controls, we will upload a mock dataset containing Personally Identifiable Information (PII).

bash
# Create a mock dataset file echo "PatientName, Diagnosis, SSN John Doe, Hypertension, 000-11-2222 Jane Smith, Diabetes, 111-22-3333" > raw_training_data.csv # Upload the file to the encrypted bucket aws s3 cp raw_training_data.csv s3://$BUCKET_NAME/raw_training_data.csv

Step 4: Implement Least-Privilege IAM for Amazon SageMaker

A core tenet of AI security is ensuring that models and compute environments (like SageMaker) only have access to the data they absolutely need.

bash
# 1. Create a trust policy file for SageMaker echo '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' > trust-policy.json # 2. Create the IAM role aws iam create-role \ --role-name SecureAITrainingRole \ --assume-role-policy-document file://trust-policy.json # 3. Create a restrictive policy file for the S3 bucket echo '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::'$BUCKET_NAME'", "arn:aws:s3:::'$BUCKET_NAME'/*" ] } ] }' > data-access-policy.json # 4. Attach the policy to the role aws iam put-role-policy \ --role-name SecureAITrainingRole \ --policy-name S3LeastPrivilegeAccess \ --policy-document file://data-access-policy.json
Console alternative
  1. Navigate to IAM > Roles > Create role.
  2. Select AWS service and choose SageMaker.
  3. Skip attaching managed policies for now and create the role as SecureAITrainingRole.
  4. Click on the newly created role, choose Add permissions > Create inline policy.
  5. Select JSON, paste the data-access-policy.json contents (replacing the bucket name manually), and name it S3LeastPrivilegeAccess.

Step 5: Enable Amazon Macie for Data Privacy Scanning

Amazon Macie uses machine learning to automatically discover and classify sensitive data in S3. If you prepare training datasets for an AI model, Macie flags PII that needs to be removed or masked.

bash
# Enable Amazon Macie in your current region aws macie2 enable-macie

Note: Setting up a specific Macie classification job via CLI requires a complex JSON structure. We will view the Macie setup in the console to understand the workflow.

Loading Diagram...
Console instructions for Macie Job
  1. Navigate to Amazon Macie in the console.
  2. On the left sidebar, click S3 buckets.
  3. Check the box next to your brainybee-ai-sec-lab-... bucket.
  4. Click Create job.
  5. Leave default settings, click Next through the screens, name the job AI-Dataset-Scan, and click Submit.
  6. Within a few minutes, Macie will generate a finding that it discovered SSNs in your raw_training_data.csv.

Checkpoints

Run these commands to verify your setup was successful:

Checkpoint 1: Verify Bucket Encryption

bash
aws s3api get-bucket-encryption --bucket $BUCKET_NAME

Expected Result: A JSON response showing the aws:kms rule and your specific KMS Key ID.

Checkpoint 2: Verify IAM Role Restrictions

bash
aws iam get-role-policy --role-name SecureAITrainingRole --policy-name S3LeastPrivilegeAccess

Expected Result: The inline policy document showing exact GetObject and ListBucket access to your specific bucket.


Troubleshooting

Issue / ErrorCauseFix
An error occurred (AccessDenied) when calling the CreateBucket operationYour IAM user lacks permissions to create S3 buckets.Ensure your IAM user has AmazonS3FullAccess or administrator privileges for the lab.
An error occurred (InvalidToken) when calling the PutBucketEncryption operationThe $BUCKET_NAME variable is empty or the KMS key ID is malformed.Ensure you exported the bucket name variable and pasted the exact KMS Key ID from Step 1.
Macie is not enabled in the console.Macie requires manual activation per region.Run the aws macie2 enable-macie command or click Enable Macie in the AWS console.
Cannot delete S3 bucket during teardown.The bucket contains files. S3 buckets must be empty before deletion.Run the recursive rm command provided in the teardown steps before deleting the bucket.

Clean-Up / Teardown

[!WARNING] Remember to run the teardown commands to avoid ongoing charges. AWS KMS keys and Amazon Macie can incur charges if left active.

Run the following commands to destroy all provisioned resources:

bash
# 1. Empty the S3 bucket aws s3 rm s3://$BUCKET_NAME --recursive # 2. Delete the S3 bucket aws s3api delete-bucket --bucket $BUCKET_NAME # 3. Delete the IAM inline policy and role aws iam delete-role-policy --role-name SecureAITrainingRole --policy-name S3LeastPrivilegeAccess aws iam delete-role --role-name SecureAITrainingRole # 4. Disable Amazon Macie # (Note: This deletes Macie configuration in this account/region) aws macie2 disable-macie # 5. Schedule KMS Key deletion (Keys cannot be deleted immediately; min 7 days) aws kms schedule-key-deletion --key-id <YOUR_KMS_KEY_ID> --pending-window-in-days 7

Clean up local files:

bash
rm raw_training_data.csv trust-policy.json data-access-policy.json

Ready to study AWS Certified AI Practitioner (AIF-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free