Hands-On Lab: Design Considerations & Inference Parameters for Foundation Models

Welcome to this guided hands-on lab exploring the design considerations for applications that use Foundation Models (FMs). In this 30-minute lab, you will use Amazon Bedrock to understand how model selection, inference parameters (like temperature), and basic prompt engineering directly affect the output generated by an FM.

Prerequisites

Before you begin, ensure you have the following:

AWS Account with administrative or appropriate IAM access (AmazonBedrockFullAccess).
AWS CLI (aws) installed and configured with your credentials.
Model Access Requested: You must request access to the Amazon Titan Text G1 - Lite model in the Amazon Bedrock console in your target region.
Prior Knowledge: Basic familiarity with JSON and terminal commands.

Learning Objectives

By completing this lab, you will be able to:

Identify available foundation models and their supported modalities via the AWS CLI.
Manipulate inference parameters (temperature, max tokens) and observe their effect on model determinism and creativity.
Apply basic prompt engineering techniques (context provision) to guide model outputs.
Evaluate the tradeoffs between response quality and design constraints.

Architecture Overview

This lab utilizes a serverless architecture invoking an Amazon Bedrock Foundation Model directly via the API.

Loading Diagram...

Step-by-Step Instructions

Step 1: Verify Foundation Model Availability

Before building an application, you must select an appropriate model based on modality, cost, and complexity. Let's list the available text models in Amazon Bedrock to verify our options.

bash

aws bedrock list-foundation-models \
    --by-output-modality TEXT \
    --query "modelSummaries[?providerName=='Amazon'].modelId" \
    --region <YOUR_REGION>

▶📸 Console alternative

Navigate to the Amazon Bedrock console.
In the left navigation pane, select Foundation models.
Filter by Provider: Amazon and Modality: Text to view available models like Titan Text Lite.

[!TIP] For this lab, we will use amazon.titan-text-lite-v1 due to its speed and cost-effectiveness for simple text tasks.

Step 2: Formulate a Baseline Prompt (Low Temperature)

We will create a JSON file containing our prompt and inference parameters. We'll start with a temperature of 0.0, which forces the model to be highly deterministic and analytical.

bash

cat <<EOF > prompt-low-temp.json
{
  "inputText": "Explain the business value of generative AI in exactly two sentences.",
  "textGenerationConfig": {
    "maxTokenCount": 100,
    "stopSequences": [],
    "temperature": 0.0,
    "topP": 0.9
  }
}
EOF

Step 3: Invoke the Model with Baseline Parameters

Now, send the request to Amazon Bedrock using the Bedrock Runtime API.

bash

aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-lite-v1 \
    --body file://prompt-low-temp.json \
    --cli-binary-format raw-in-base64-out \
    --accept application/json \
    --content-type application/json \
    --region <YOUR_REGION> \
    response-low-temp.json

▶📸 Console alternative

In the Amazon Bedrock console, go to Playgrounds > Text.
Select Amazon and Titan Text G1 - Lite.
Type your prompt in the chat window.
Open the Configurations panel on the right, set Temperature to 0.0, and click Run.

Step 4: Increase Temperature for Creative Variance

When a model's temperature is increased, it selects less probable words, leading to more creative (but potentially less factual) outputs. Let's test this by increasing the temperature to 1.0.

bash

cat <<EOF > prompt-high-temp.json
{
  "inputText": "Explain the business value of generative AI in exactly two sentences.",
  "textGenerationConfig": {
    "maxTokenCount": 100,
    "stopSequences": [],
    "temperature": 1.0,
    "topP": 0.9
  }
}
EOF

aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-lite-v1 \
    --body file://prompt-high-temp.json \
    --cli-binary-format raw-in-base64-out \
    --accept application/json \
    --content-type application/json \
    --region <YOUR_REGION> \
    response-high-temp.json

Step 5: Implement Prompt Engineering (Context & Constraint)

An essential design consideration is using prompt engineering to guide the model's behavior and reduce hallucinations. Let's add strict context and constraints.

bash

cat <<EOF > prompt-context.json
{
  "inputText": "Context: You are a strict financial advisor. You only give advice related to cost optimization.\n\nInstruction: Should a company use the largest available foundation model for a simple spelling checker application? Explain why in one sentence.",
  "textGenerationConfig": {
    "maxTokenCount": 150,
    "temperature": 0.1,
    "topP": 0.9
  }
}
EOF

aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-lite-v1 \
    --body file://prompt-context.json \
    --cli-binary-format raw-in-base64-out \
    --accept application/json \
    --content-type application/json \
    --region <YOUR_REGION> \
    response-context.json

Checkpoints

Verify the outputs of your invocations to ensure the model responded correctly.

Checkpoint 1: View the low-temperature response

bash

cat response-low-temp.json

Expected Result: A highly standardized, straightforward two-sentence explanation.

Checkpoint 2: View the context-engineered response

bash

cat response-context.json

Expected Result: The model should adopt the "financial advisor" persona and advise against using large models for simple tasks due to cost implications.

Concept Review: The Temperature Tradeoff

Understanding inference parameters is critical for GenAI application design. The relationship between Temperature, Creativity, and Determinism is visualized below.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Design Parameter	Description	Business Application Impact
Temperature	Controls randomness of token selection	Low for code/math (deterministic); High for marketing copy (creative)
Max Tokens	Hard limit on output length	Controls API latency and prevents runaway cost per invocation
Model Size	Number of parameters in the FM	Larger models = higher accuracy but higher latency and cost

Troubleshooting

Error / Issue	Probable Cause	Fix / Solution
`AccessDeniedException`	Model access not enabled	Go to the Bedrock Console -> Model access -> Manage model access -> Request access for Amazon Titan.
`ValidationException`	Malformed JSON payload	Ensure your `prompt.json` files have valid formatting and escaped quotes.
`ExpiredToken`	AWS Credentials expired	Re-authenticate your CLI session (`aws sso login` or export new keys).

Cost Estimate

Amazon Bedrock charges On-Demand inference per 1,000 tokens processed (input and output).

Amazon Titan Text Lite: ~$0.0003 per 1,000 input tokens / ~$0.0004 per 1,000 output tokens.
Total Estimated Cost for this Lab: < $0.05 USD.

Clean-Up / Teardown

Since we utilized Amazon Bedrock in On-Demand mode, there are no endpoints to delete or ongoing hourly charges. However, you should clean up your local directory to maintain security and order.

[!WARNING] Remember to run the teardown commands to avoid leaving potentially sensitive prompt data on your local machine.

bash

# Remove local prompt and response JSON files
rm prompt-low-temp.json prompt-high-temp.json prompt-context.json
rm response-low-temp.json response-high-temp.json response-context.json

Stretch Challenge

Challenge: Try implementing a Few-Shot Prompting technique. Create a new JSON payload where your inputText provides 3 examples of analyzing customer sentiment (Positive/Negative/Neutral) before asking the model to evaluate a new, ambiguous review. Set the temperature to 0.2 to ensure consistent formatting.

▶Show solution

bash

cat <<EOF > prompt-few-shot.json
{
  "inputText": "Analyze the sentiment of the text.\nText: I love the new interface!\nSentiment: Positive\n\nText: The application keeps crashing.\nSentiment: Negative\n\nText: The button is blue.\nSentiment: Neutral\n\nText: The response time is okay, but it could be much faster considering the price.\nSentiment:",
  "textGenerationConfig": {
    "maxTokenCount": 10,
    "temperature": 0.2,
    "topP": 0.9
  }
}
EOF
aws bedrock-runtime invoke-model --model-id amazon.titan-text-lite-v1 --body file://prompt-few-shot.json --cli-binary-format raw-in-base64-out response-few-shot.json

Hands-On Lab: Design Considerations & Inference Parameters for Foundation Models

Prerequisites

Before you begin, ensure you have the following:

AWS Account with administrative or appropriate IAM access (AmazonBedrockFullAccess).
AWS CLI (aws) installed and configured with your credentials.
Model Access Requested: You must request access to the Amazon Titan Text G1 - Lite model in the Amazon Bedrock console in your target region.
Prior Knowledge: Basic familiarity with JSON and terminal commands.

Learning Objectives

By completing this lab, you will be able to:

Identify available foundation models and their supported modalities via the AWS CLI.
Manipulate inference parameters (temperature, max tokens) and observe their effect on model determinism and creativity.
Apply basic prompt engineering techniques (context provision) to guide model outputs.
Evaluate the tradeoffs between response quality and design constraints.

Architecture Overview

This lab utilizes a serverless architecture invoking an Amazon Bedrock Foundation Model directly via the API.

Loading Diagram...

Step-by-Step Instructions

Step 1: Verify Foundation Model Availability

Before building an application, you must select an appropriate model based on modality, cost, and complexity. Let's list the available text models in Amazon Bedrock to verify our options.

bash

aws bedrock list-foundation-models \
    --by-output-modality TEXT \
    --query "modelSummaries[?providerName=='Amazon'].modelId" \
    --region <YOUR_REGION>

▶📸 Console alternative

Navigate to the Amazon Bedrock console.
In the left navigation pane, select Foundation models.
Filter by Provider: Amazon and Modality: Text to view available models like Titan Text Lite.

[!TIP] For this lab, we will use amazon.titan-text-lite-v1 due to its speed and cost-effectiveness for simple text tasks.

Step 2: Formulate a Baseline Prompt (Low Temperature)

We will create a JSON file containing our prompt and inference parameters. We'll start with a temperature of 0.0, which forces the model to be highly deterministic and analytical.

bash

cat <<EOF > prompt-low-temp.json
{
  "inputText": "Explain the business value of generative AI in exactly two sentences.",
  "textGenerationConfig": {
    "maxTokenCount": 100,
    "stopSequences": [],
    "temperature": 0.0,
    "topP": 0.9
  }
}
EOF

Step 3: Invoke the Model with Baseline Parameters

Now, send the request to Amazon Bedrock using the Bedrock Runtime API.

bash

aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-lite-v1 \
    --body file://prompt-low-temp.json \
    --cli-binary-format raw-in-base64-out \
    --accept application/json \
    --content-type application/json \
    --region <YOUR_REGION> \
    response-low-temp.json

▶📸 Console alternative

In the Amazon Bedrock console, go to Playgrounds > Text.
Select Amazon and Titan Text G1 - Lite.
Type your prompt in the chat window.
Open the Configurations panel on the right, set Temperature to 0.0, and click Run.

Step 4: Increase Temperature for Creative Variance

When a model's temperature is increased, it selects less probable words, leading to more creative (but potentially less factual) outputs. Let's test this by increasing the temperature to 1.0.

bash

cat <<EOF > prompt-high-temp.json
{
  "inputText": "Explain the business value of generative AI in exactly two sentences.",
  "textGenerationConfig": {
    "maxTokenCount": 100,
    "stopSequences": [],
    "temperature": 1.0,
    "topP": 0.9
  }
}
EOF

aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-lite-v1 \
    --body file://prompt-high-temp.json \
    --cli-binary-format raw-in-base64-out \
    --accept application/json \
    --content-type application/json \
    --region <YOUR_REGION> \
    response-high-temp.json

Step 5: Implement Prompt Engineering (Context & Constraint)

An essential design consideration is using prompt engineering to guide the model's behavior and reduce hallucinations. Let's add strict context and constraints.

bash

cat <<EOF > prompt-context.json
{
  "inputText": "Context: You are a strict financial advisor. You only give advice related to cost optimization.\n\nInstruction: Should a company use the largest available foundation model for a simple spelling checker application? Explain why in one sentence.",
  "textGenerationConfig": {
    "maxTokenCount": 150,
    "temperature": 0.1,
    "topP": 0.9
  }
}
EOF

aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-lite-v1 \
    --body file://prompt-context.json \
    --cli-binary-format raw-in-base64-out \
    --accept application/json \
    --content-type application/json \
    --region <YOUR_REGION> \
    response-context.json

Checkpoints

Verify the outputs of your invocations to ensure the model responded correctly.

Checkpoint 1: View the low-temperature response

bash

cat response-low-temp.json

Expected Result: A highly standardized, straightforward two-sentence explanation.

Checkpoint 2: View the context-engineered response

bash

cat response-context.json

Expected Result: The model should adopt the "financial advisor" persona and advise against using large models for simple tasks due to cost implications.

Concept Review: The Temperature Tradeoff

Understanding inference parameters is critical for GenAI application design. The relationship between Temperature, Creativity, and Determinism is visualized below.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Design Parameter	Description	Business Application Impact
Temperature	Controls randomness of token selection	Low for code/math (deterministic); High for marketing copy (creative)
Max Tokens	Hard limit on output length	Controls API latency and prevents runaway cost per invocation
Model Size	Number of parameters in the FM	Larger models = higher accuracy but higher latency and cost

Troubleshooting

Error / Issue	Probable Cause	Fix / Solution
`AccessDeniedException`	Model access not enabled	Go to the Bedrock Console -> Model access -> Manage model access -> Request access for Amazon Titan.
`ValidationException`	Malformed JSON payload	Ensure your `prompt.json` files have valid formatting and escaped quotes.
`ExpiredToken`	AWS Credentials expired	Re-authenticate your CLI session (`aws sso login` or export new keys).

Cost Estimate

Amazon Bedrock charges On-Demand inference per 1,000 tokens processed (input and output).

Amazon Titan Text Lite: ~$0.0003 per 1,000 input tokens / ~$0.0004 per 1,000 output tokens.
Total Estimated Cost for this Lab: < $0.05 USD.

Clean-Up / Teardown

Since we utilized Amazon Bedrock in On-Demand mode, there are no endpoints to delete or ongoing hourly charges. However, you should clean up your local directory to maintain security and order.

[!WARNING] Remember to run the teardown commands to avoid leaving potentially sensitive prompt data on your local machine.

bash

# Remove local prompt and response JSON files
rm prompt-low-temp.json prompt-high-temp.json prompt-context.json
rm response-low-temp.json response-high-temp.json response-context.json

Stretch Challenge

▶Show solution

bash

cat <<EOF > prompt-few-shot.json
{
  "inputText": "Analyze the sentiment of the text.\nText: I love the new interface!\nSentiment: Positive\n\nText: The application keeps crashing.\nSentiment: Negative\n\nText: The button is blue.\nSentiment: Neutral\n\nText: The response time is okay, but it could be much faster considering the price.\nSentiment:",
  "textGenerationConfig": {
    "maxTokenCount": 10,
    "temperature": 0.2,
    "topP": 0.9
  }
}
EOF
aws bedrock-runtime invoke-model --model-id amazon.titan-text-lite-v1 --body file://prompt-few-shot.json --cli-binary-format raw-in-base64-out response-few-shot.json