Hands-On Lab: Training and Fine-Tuning Foundation Models on AWS
the training and fine-tuning process for foundation models (FMs)
Prerequisites
Before starting this lab, ensure you have the following prerequisites in place:
- Cloud Account: An AWS Account with Administrator access.
- CLI Tools: AWS CLI installed and configured (
aws configure) with your access keys. - Model Access: Amazon Titan Text Lite model access enabled in the Amazon Bedrock console (under Model access).
- Knowledge: Basic understanding of JSON structures and foundational AI concepts (Pre-training vs. Fine-tuning).
Learning Objectives
By completing this lab, you will be able to:
- Prepare and format a dataset for instruction fine-tuning (JSONL format).
- Upload training data to an Amazon S3 bucket using the AWS CLI.
- Configure and launch a model customization (fine-tuning) job in Amazon Bedrock.
- Understand the deployment phase of a fine-tuned model (Provisioned Throughput) and associated cost considerations.
Architecture Overview
The following diagram illustrates the flow of data and services used in this lab to fine-tune a Foundation Model (FM).
To understand where fine-tuning fits into model customization, consider the difference between updating model weights versus retrieving external context:
Step-by-Step Instructions
Step 1: Create an Amazon S3 Bucket for Training Data
First, we need a secure storage location for our fine-tuning dataset. We will create an S3 bucket.
aws s3 mb s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID> --region us-east-1▶💻 Console alternative
- Navigate to the S3 Console.
- Click Create bucket.
- Enter the bucket name
brainybee-lab-finetuning-<YOUR_ACCOUNT_ID>. - Leave all other settings as default and click Create bucket.
[!TIP] S3 bucket names must be globally unique. Replace
<YOUR_ACCOUNT_ID>with your actual AWS account number or a random string.
Step 2: Prepare the Fine-Tuning Dataset
Instruction fine-tuning requires data formatted in JSON Lines (.jsonl). Each line represents a single training example with a prompt and the expected completion.
Create a file named training-data.jsonl and add the following sample content:
cat <<EOF > training-data.jsonl
{"prompt": "Classify this ticket: The billing page is returning a 500 error.", "completion": "Category: Technical Support | Priority: High"}
{"prompt": "Classify this ticket: I want to upgrade my subscription to the premium tier.", "completion": "Category: Sales | Priority: Medium"}
{"prompt": "Classify this ticket: How do I change my profile picture?", "completion": "Category: General Inquiry | Priority: Low"}
{"prompt": "Classify this ticket: The database is down and no one can log in!", "completion": "Category: Technical Support | Priority: Critical"}
EOFUpload this file to your S3 bucket:
aws s3 cp training-data.jsonl s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID>/data/training-data.jsonlStep 3: Create the Fine-Tuning Job in Amazon Bedrock
We will now instruct Bedrock to train a custom model using our data. While this can be done via CLI, the Console is highly recommended for beginners as it automatically provisions the necessary IAM roles.
If using the CLI, you must first create an IAM trust policy and role. For this step, we will use the console path.
▶💻 Console Instructions (Recommended)
- Navigate to the Amazon Bedrock Console.
- In the left navigation pane, under Foundation models, select Custom models.
- Click Customize model > Create Fine-tuning job.
- Model details:
- Base model: Select Amazon Titan Text Lite.
- Custom model name:
ticket-classifier-model.
- Job configuration:
- Job name:
ticket-classifier-job. - Input data: Provide the S3 URI:
s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID>/data/training-data.jsonl.
- Job name:
- Hyperparameters: Leave defaults (Epochs, Batch size, Learning rate).
- Output data: Specify the same S3 bucket:
s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID>/output/. - Service access: Select Create and use a new service role.
- Click Create model customization job.
📸 Screenshot Placeholder: Amazon Bedrock Model Customization Job configuration screen.
Step 4: Monitor the Customization Job
Fine-tuning takes time (typically 30-60 minutes for small datasets). You can monitor the status using the CLI.
aws bedrock list-model-customization-jobs --query "modelCustomizationJobSummaries[0].[jobName, status]"[!NOTE] Wait until the status changes from
InProgresstoCompletedbefore moving to the next step.
Step 5: (Optional) Provision Throughput for Inference
[!WARNING] COST ALERT: To query a custom model in Bedrock, you must purchase Provisioned Throughput. This involves an hourly charge and usually requires a 1-month commitment. DO NOT run this step in a personal account unless you are prepared for the cost.
If you are in a provided sandbox environment:
aws bedrock create-provisioned-model-throughput \
--provisioned-model-name ticket-classifier-throughput \
--model-id arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:custom-model/ticket-classifier-model \
--model-units 1Once provisioned, you can test the model in the Bedrock Playground by selecting your Custom Model from the dropdown list.
Checkpoints
Verify your progress after Step 2 by checking the S3 bucket contents:
aws s3 ls s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID>/data/Expected Output: ... training-data.jsonl
Verify your progress after Step 4 by checking the custom models list:
aws bedrock list-custom-models --query "modelSummaries[?modelName=='ticket-classifier-model'].[modelName, creationTime]"Expected Output: An array containing ticket-classifier-model and a timestamp.
Clean-Up / Teardown
[!IMPORTANT] Failure to clean up resources, especially Provisioned Throughput, will result in significant ongoing AWS charges.
Execute the following commands to tear down the lab environment:
1. Delete Provisioned Throughput (If created in Step 5):
aws bedrock delete-provisioned-model-throughput --provisioned-model-id ticket-classifier-throughput2. Delete the Custom Model:
aws bedrock delete-custom-model --model-identifier arn:aws:bedrock:us-east-1:<YOUR_ACCOUNT_ID>:custom-model/ticket-classifier-model3. Delete the S3 Bucket and its contents:
aws s3 rm s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID> --recursive
aws s3 rb s3://brainybee-lab-finetuning-<YOUR_ACCOUNT_ID>Troubleshooting
| Common Error | Cause | Fix |
|---|---|---|
ValidationException during job creation | The JSONL format is incorrect or contains invalid keys. | Ensure your dataset strictly uses {"prompt": "...", "completion": "..."} formats without trailing commas. |
AccessDeniedException | The IAM role Bedrock is using lacks permissions to read from your S3 bucket. | If using the CLI, ensure the trust policy allows bedrock.amazonaws.com to assume the role, and the role has s3:GetObject on the bucket. Using the console wizard fixes this automatically. |
ResourceNotFound | You have not requested access to the base Titan model in Bedrock. | Navigate to Bedrock > Model access, and request access to the Amazon Titan Text models. |
| Job fails after 5 minutes | Dataset is too small for the selected hyperparameters. | Amazon Bedrock requires a minimum number of tokens/lines depending on the model. Ensure you have at least 100-200 valid JSONL rows for real training. |