Hands-On Lab: Design Considerations & Inference Parameters for Foundation Models
Design considerations for applications that use foundation models (FMs)
Hands-On Lab: Design Considerations & Inference Parameters for Foundation Models
Welcome to this guided hands-on lab exploring the design considerations for applications that use Foundation Models (FMs). In this 30-minute lab, you will use Amazon Bedrock to understand how model selection, inference parameters (like temperature), and basic prompt engineering directly affect the output generated by an FM.
Prerequisites
Before you begin, ensure you have the following:
- AWS Account with administrative or appropriate IAM access (
AmazonBedrockFullAccess). - AWS CLI (
aws) installed and configured with your credentials. - Model Access Requested: You must request access to the Amazon Titan Text G1 - Lite model in the Amazon Bedrock console in your target region.
- Prior Knowledge: Basic familiarity with JSON and terminal commands.
Learning Objectives
By completing this lab, you will be able to:
- Identify available foundation models and their supported modalities via the AWS CLI.
- Manipulate inference parameters (temperature, max tokens) and observe their effect on model determinism and creativity.
- Apply basic prompt engineering techniques (context provision) to guide model outputs.
- Evaluate the tradeoffs between response quality and design constraints.
Architecture Overview
This lab utilizes a serverless architecture invoking an Amazon Bedrock Foundation Model directly via the API.
Step-by-Step Instructions
Step 1: Verify Foundation Model Availability
Before building an application, you must select an appropriate model based on modality, cost, and complexity. Let's list the available text models in Amazon Bedrock to verify our options.
aws bedrock list-foundation-models \
--by-output-modality TEXT \
--query "modelSummaries[?providerName=='Amazon'].modelId" \
--region <YOUR_REGION>▶📸 Console alternative
- Navigate to the Amazon Bedrock console.
- In the left navigation pane, select Foundation models.
- Filter by Provider: Amazon and Modality: Text to view available models like Titan Text Lite.
[!TIP] For this lab, we will use
amazon.titan-text-lite-v1due to its speed and cost-effectiveness for simple text tasks.
Step 2: Formulate a Baseline Prompt (Low Temperature)
We will create a JSON file containing our prompt and inference parameters. We'll start with a temperature of 0.0, which forces the model to be highly deterministic and analytical.
cat <<EOF > prompt-low-temp.json
{
"inputText": "Explain the business value of generative AI in exactly two sentences.",
"textGenerationConfig": {
"maxTokenCount": 100,
"stopSequences": [],
"temperature": 0.0,
"topP": 0.9
}
}
EOFStep 3: Invoke the Model with Baseline Parameters
Now, send the request to Amazon Bedrock using the Bedrock Runtime API.
aws bedrock-runtime invoke-model \
--model-id amazon.titan-text-lite-v1 \
--body file://prompt-low-temp.json \
--cli-binary-format raw-in-base64-out \
--accept application/json \
--content-type application/json \
--region <YOUR_REGION> \
response-low-temp.json▶📸 Console alternative
- In the Amazon Bedrock console, go to Playgrounds > Text.
- Select Amazon and Titan Text G1 - Lite.
- Type your prompt in the chat window.
- Open the Configurations panel on the right, set Temperature to
0.0, and click Run.
Step 4: Increase Temperature for Creative Variance
When a model's temperature is increased, it selects less probable words, leading to more creative (but potentially less factual) outputs. Let's test this by increasing the temperature to 1.0.
cat <<EOF > prompt-high-temp.json
{
"inputText": "Explain the business value of generative AI in exactly two sentences.",
"textGenerationConfig": {
"maxTokenCount": 100,
"stopSequences": [],
"temperature": 1.0,
"topP": 0.9
}
}
EOF
aws bedrock-runtime invoke-model \
--model-id amazon.titan-text-lite-v1 \
--body file://prompt-high-temp.json \
--cli-binary-format raw-in-base64-out \
--accept application/json \
--content-type application/json \
--region <YOUR_REGION> \
response-high-temp.jsonStep 5: Implement Prompt Engineering (Context & Constraint)
An essential design consideration is using prompt engineering to guide the model's behavior and reduce hallucinations. Let's add strict context and constraints.
cat <<EOF > prompt-context.json
{
"inputText": "Context: You are a strict financial advisor. You only give advice related to cost optimization.\n\nInstruction: Should a company use the largest available foundation model for a simple spelling checker application? Explain why in one sentence.",
"textGenerationConfig": {
"maxTokenCount": 150,
"temperature": 0.1,
"topP": 0.9
}
}
EOF
aws bedrock-runtime invoke-model \
--model-id amazon.titan-text-lite-v1 \
--body file://prompt-context.json \
--cli-binary-format raw-in-base64-out \
--accept application/json \
--content-type application/json \
--region <YOUR_REGION> \
response-context.jsonCheckpoints
Verify the outputs of your invocations to ensure the model responded correctly.
Checkpoint 1: View the low-temperature response
cat response-low-temp.jsonExpected Result: A highly standardized, straightforward two-sentence explanation.
Checkpoint 2: View the context-engineered response
cat response-context.jsonExpected Result: The model should adopt the "financial advisor" persona and advise against using large models for simple tasks due to cost implications.
Concept Review: The Temperature Tradeoff
Understanding inference parameters is critical for GenAI application design. The relationship between Temperature, Creativity, and Determinism is visualized below.
| Design Parameter | Description | Business Application Impact |
|---|---|---|
| Temperature | Controls randomness of token selection | Low for code/math (deterministic); High for marketing copy (creative) |
| Max Tokens | Hard limit on output length | Controls API latency and prevents runaway cost per invocation |
| Model Size | Number of parameters in the FM | Larger models = higher accuracy but higher latency and cost |
Troubleshooting
| Error / Issue | Probable Cause | Fix / Solution |
|---|---|---|
AccessDeniedException | Model access not enabled | Go to the Bedrock Console -> Model access -> Manage model access -> Request access for Amazon Titan. |
ValidationException | Malformed JSON payload | Ensure your prompt.json files have valid formatting and escaped quotes. |
ExpiredToken | AWS Credentials expired | Re-authenticate your CLI session (aws sso login or export new keys). |
Cost Estimate
Amazon Bedrock charges On-Demand inference per 1,000 tokens processed (input and output).
- Amazon Titan Text Lite: ~$0.0003 per 1,000 input tokens / ~$0.0004 per 1,000 output tokens.
- Total Estimated Cost for this Lab: < $0.05 USD.
Clean-Up / Teardown
Since we utilized Amazon Bedrock in On-Demand mode, there are no endpoints to delete or ongoing hourly charges. However, you should clean up your local directory to maintain security and order.
[!WARNING] Remember to run the teardown commands to avoid leaving potentially sensitive prompt data on your local machine.
# Remove local prompt and response JSON files
rm prompt-low-temp.json prompt-high-temp.json prompt-context.json
rm response-low-temp.json response-high-temp.json response-context.jsonStretch Challenge
Challenge: Try implementing a Few-Shot Prompting technique. Create a new JSON payload where your inputText provides 3 examples of analyzing customer sentiment (Positive/Negative/Neutral) before asking the model to evaluate a new, ambiguous review. Set the temperature to 0.2 to ensure consistent formatting.
▶Show solution
cat <<EOF > prompt-few-shot.json
{
"inputText": "Analyze the sentiment of the text.\nText: I love the new interface!\nSentiment: Positive\n\nText: The application keeps crashing.\nSentiment: Negative\n\nText: The button is blue.\nSentiment: Neutral\n\nText: The response time is okay, but it could be much faster considering the price.\nSentiment:",
"textGenerationConfig": {
"maxTokenCount": 10,
"temperature": 0.2,
"topP": 0.9
}
}
EOF
aws bedrock-runtime invoke-model --model-id amazon.titan-text-lite-v1 --body file://prompt-few-shot.json --cli-binary-format raw-in-base64-out response-few-shot.json