Hands-On Lab: Exploring Foundation Model Design Considerations with Amazon Bedrock
Design considerations for applications that use foundation models (FMs)
Hands-On Lab: Exploring Foundation Model Design Considerations with Amazon Bedrock
Estimated Time: 30 minutes Difficulty: Guided Cloud Provider: AWS
Foundation models (FMs) act as sophisticated universal translators that can understand and generate human-like text, code, and multimodal content. In this lab, we will explore the practical design considerations of using FMs—specifically how model selection, prompt design, and inference parameters (like temperature and max tokens) impact output, cost, and latency.
Prerequisites
Before starting this lab, ensure you have the following:
- AWS Account: An active AWS account with Administrator or PowerUser permissions.
- AWS CLI: Installed and configured locally with your credentials (
aws configure). - Familiarity: Basic understanding of JSON and terminal/command-line operations.
- Region Selection: Use
us-east-1(N. Virginia) orus-west-2(Oregon) as Amazon Bedrock model availability is highest in these regions.
Learning Objectives
By the end of this lab, you will be able to:
- Request and manage Foundation Model access within Amazon Bedrock.
- Invoke an FM using the AWS CLI and the AWS Management Console.
- Modify inference parameters (e.g., temperature, input/output length) and observe their effect on model responses.
- Evaluate design considerations regarding token-based pricing, latency, and response accuracy.
Architecture Overview
The architecture for this lab is straightforward, focusing on the interaction between a client interface and the managed Amazon Bedrock service.
Step-by-Step Instructions
Step 1: Request Model Access in Amazon Bedrock
By default, access to Foundation Models in Amazon Bedrock is not enabled. You must explicitly request access, which represents a key compliance and governance design consideration.
# Check currently available models in your region
aws bedrock list-foundation-models --query "modelSummaries[*].[modelId, modelName]" --output table▶Console alternative: Requesting Access
- Log in to the AWS Management Console.
- Navigate to Amazon Bedrock.
- In the left navigation pane, scroll down and click on Model access.
- Click the Manage model access button (top right).
- Check the box next to Titan Text G1 - Lite (under Amazon).
- Scroll to the bottom and click Save changes.
- Wait a few moments until the Access status changes to Access granted.
📸 Screenshot: Checkbox selected next to "Titan Text G1 - Lite" with "Access granted" badge.
[!IMPORTANT] For the CLI execution of model access, it is highly recommended to use the Console as AWS requires accepting EULAs (End User License Agreements) for certain third-party models which cannot easily be done via CLI.
Step 2: Invoke a Model with Default Parameters
Now that you have access, let's invoke the amazon.titan-text-lite-v1 model. We will ask it to explain a complex topic to evaluate its default behavior.
Create a file named payload-default.json with the following content:
cat <<EOF > payload-default.json
{
"inputText": "Explain the business value of Generative AI in two sentences.",
"textGenerationConfig": {
"maxTokenCount": 100,
"temperature": 0.7
}
}
EOFNow, invoke the model using the Bedrock Runtime:
aws bedrock-runtime invoke-model \
--model-id amazon.titan-text-lite-v1 \
--body fileb://payload-default.json \
--cli-binary-format raw-in-base64-out \
--accept "application/json" \
--content-type "application/json" \
output-default.txt
# View the result
cat output-default.txt▶Console alternative: Bedrock Playground
- In the Amazon Bedrock console, navigate to Playgrounds > Text.
- Click Select model, choose Amazon, and then select Titan Text G1 - Lite.
- Click Apply.
- In the chat box, type: "Explain the business value of Generative AI in two sentences."
- Click Run and observe the output.
📸 Screenshot: The Bedrock Text Playground showing the prompt and the model's generated response.
Step 3: Observe the Effect of Inference Parameters (Temperature)
Temperature controls the randomness (or creativity) of the model. A lower temperature (e.g., 0.0) produces deterministic, factual answers. A higher temperature (e.g., 0.9) produces more creative but potentially unpredictable answers (increasing the risk of hallucinations).
Create a new payload with a temperature of 0.0:
cat <<EOF > payload-strict.json
{
"inputText": "Explain the business value of Generative AI in two sentences.",
"textGenerationConfig": {
"maxTokenCount": 100,
"temperature": 0.0
}
}
EOF
aws bedrock-runtime invoke-model \
--model-id amazon.titan-text-lite-v1 \
--body fileb://payload-strict.json \
--cli-binary-format raw-in-base64-out \
--accept "application/json" \
--content-type "application/json" \
output-strict.txt
cat output-strict.txt[!TIP] Compare
output-default.txtandoutput-strict.txt. If you run the0.0temperature prompt multiple times, the output will remain nearly identical. This consistency is a critical design consideration for enterprise applications like customer support bots.
Step 4: Evaluate Output Length and Cost Trade-offs
Generative AI models are billed based on token consumption (input tokens + output tokens). If you set maxTokenCount too high, a verbose model might generate unnecessarily long answers, driving up costs and latency.
Let's constrain the model to a very short token count:
cat <<EOF > payload-short.json
{
"inputText": "Explain the business value of Generative AI in two sentences.",
"textGenerationConfig": {
"maxTokenCount": 20,
"temperature": 0.5
}
}
EOF
aws bedrock-runtime invoke-model \
--model-id amazon.titan-text-lite-v1 \
--body fileb://payload-short.json \
--cli-binary-format raw-in-base64-out \
--accept "application/json" \
--content-type "application/json" \
output-short.txt
cat output-short.txt[!NOTE] Notice how the response in
output-short.txtis likely cut off mid-sentence. When designing applications, you must balance cost (lower max tokens) with performance requirements (ensuring complete, coherent answers).
Checkpoints
Use these commands to verify your progress:
Checkpoint 1: Verify Payload Creation
ls -l payload-*.json
# Expected result: You should see payload-default.json, payload-strict.json, and payload-short.json.Checkpoint 2: Verify Successful Invocations
cat output-short.txt | grep -q "results"
# If this command returns no error (exit code 0), your JSON response was successfully captured.Concept Review: Customization vs. Prompting
As you explored inference parameters, remember that modifying inputs is just one way to control FMs. Here is a brief comparison of how you might adapt models for business applications:
| Approach | Cost/Effort | Use Case | Example |
|---|---|---|---|
| Prompt Engineering | Low | Formatting outputs, basic context | Zero-shot, Few-shot learning |
| RAG (Retrieval-Augmented Generation) | Medium | Providing up-to-date, domain-specific facts to prevent hallucinations | Amazon Bedrock Knowledge Bases |
| Fine-Tuning | High | Adapting tone, style, or highly specialized domain language | Instruction tuning a model for medical terminology |
| Pre-training | Very High | Creating a brand new foundational model from scratch | Training a new multi-lingual model |
Teardown
[!WARNING] While keeping Model Access enabled in Bedrock does not incur ongoing charges, generating tokens does. Clean up your local environment to ensure no sensitive data or credentials are left behind.
Run the following commands to delete the local artifacts generated during this lab:
# Remove all generated payload and output files
rm payload-default.json payload-strict.json payload-short.json
rm output-default.txt output-strict.txt output-short.txt
echo "Cleanup complete!"▶Console alternative: Revoking Model Access
If you wish to cleanly revoke access:
- Navigate back to Amazon Bedrock > Model access.
- Click Manage model access.
- Uncheck Titan Text G1 - Lite.
- Click Save changes.
Troubleshooting
| Common Error | Cause | Fix |
|---|---|---|
AccessDeniedException | You have not requested access to the specific model in the Bedrock console. | Go to the Bedrock Console -> Model Access, and explicitly enable the model. |
ValidationException | The JSON payload structure is incorrect for the chosen model. | Different models (e.g., Claude vs. Titan) require different JSON schemas. Check the Bedrock documentation for the specific model's payload structure. |
UnrecognizedClientException | AWS CLI is not configured, or credentials have expired. | Run aws configure and provide valid access keys. |
ThrottlingException | Too many requests sent to the model in a short period. | Wait a few seconds and try the invocation again. |
Stretch Challenge
Challenge: Try invoking a different model, such as Anthropic's anthropic.claude-v2 or anthropic.claude-3-haiku-20240307-v1:0 (if available in your region).
Constraint: You will need to research the specific JSON payload format required by Anthropic models, as it differs from Amazon Titan. Attempt to prompt the model to adopt a specific "persona" (e.g., "You are a helpful AWS Cloud Architect...") using prompt engineering techniques.
▶Show Solution
# Note: Ensure you have requested access to Anthropic Claude in the Console first.
cat <<EOF > claude-payload.json
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 200,
"messages": [
{
"role": "user",
"content": "You are a helpful AWS Cloud Architect. Explain RAG to a beginner."
}
],
"temperature": 0.5
}
EOF
aws bedrock-runtime invoke-model \
--model-id anthropic.claude-3-haiku-20240307-v1:0 \
--body fileb://claude-payload.json \
--cli-binary-format raw-in-base64-out \
--accept "application/json" \
--content-type "application/json" \
claude-output.txt
cat claude-output.txtCost Estimate
Amazon Bedrock charges based on tokens processed (both input and output).
- Amazon Titan Text Lite: ~$0.0003 per 1,000 input tokens / ~$0.0004 per 1,000 output tokens.
- The prompts and responses in this lab consist of less than 500 tokens total.
- Total estimated cost for this lab:
< $0.01(Virtually free). - Note: There are no hourly provisioning charges unless you are using Provisioned Throughput, which we did not use in this lab (we used On-Demand).