Curriculum Overview: Foundational Generative AI Concepts — AWS Certified AI Practitioner (AIF-C01) Study Notes | BrainyBee

Prerequisites

Before embarking on this curriculum to understand foundational Generative AI (GenAI) concepts, learners should possess a baseline understanding of general computing and data principles.

Cloud Computing Basics: Familiarity with high-level cloud concepts (e.g., AWS infrastructure, managed services vs. self-hosted).
Traditional Machine Learning: A basic grasp of what traditional AI/ML entails (e.g., classification, prediction, supervised vs. unsupervised learning) to understand how GenAI shifts the paradigm from classification to creation.
Data Types: An understanding of structured data (tables, databases) versus unstructured data (raw text, images, video).

[!NOTE] While deep programming or mathematical expertise is not required, comfort with conceptual workflows and basic algebraic representations (e.g., vectors) will accelerate your understanding of embeddings.

Module Breakdown

This curriculum is divided into four progressive modules, taking you from how data is formatted for AI systems to how we control and direct complex foundation models.

Module	Title	Core Focus	Difficulty
Module 1	The Language Pipeline	Tokens, Chunking, Embeddings, Vectors	⭐ Beginner
Module 2	AI Brain Architecture	Transformers, Self-Attention, LLMs	⭐⭐ Intermediate
Module 3	Expanding the Senses	Foundation Models (FMs), Multimodality, Diffusion	⭐⭐ Intermediate
Module 4	Controlling the Output	Prompt Engineering, Parameters, RAG	⭐⭐⭐ Advanced

The Text Processing Pipeline

To visualize how we get from raw human text to machine-readable formats in Module 1, review the following flowchart:

Loading Diagram...

Learning Objectives per Module

Module 1: The Language Pipeline

Tokenization: Explain how raw text is broken down into atomic units (tokens) such as words, sub-words, or characters.
Chunking: Understand the practice of grouping tokens into manageable, context-rich phrases to aid the model in grasping grammatical and structural relationships.
Embeddings & Vectors: Describe how tokens are mapped into high-dimensional numerical spaces (vectors $\vec{v} \in \mathbb{R}^d$ ) to represent semantic meaning.

▶Click to expand: Understanding Embedding Spaces

When text is converted into embeddings, words with similar meanings are mapped closer together in physical vector space. This allows the computer to calculate semantic similarity mathematically using distance metrics like Cosine Similarity:

$\text{Similarity} = \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}$

Module 2: AI Brain Architecture

Transformer-Based LLMs: Grasp the core architecture behind modern Large Language Models (LLMs).
Self-Attention Mechanism: Discuss how Transformers process tokens in parallel, dynamically weighting the importance of all words in a sequence to capture long-range dependencies and context.

Module 3: Expanding the Senses

Foundation Models (FMs): Define FMs as massive, pre-trained models that serve as versatile starting points for a wide array of downstream generative tasks.
Multimodal Models: Recognize models capable of processing and generating multiple forms of data simultaneously (e.g., text-to-image, audio-to-text).
Diffusion Models: Differentiate diffusion techniques (gradually refining noisy images into clear targets) from text-based transformer generation.

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

A simplified 2D representation of how embeddings capture semantic relationships.

Module 4: Controlling the Output

Prompt Engineering: Master the art of crafting specific, context-rich inputs to guide a model's outputs and reduce hallucinations.
Inference Parameters: Describe how tweaking temperature and top-p affects model creativity and non-determinism.

Success Metrics

How will you know you have mastered this foundational curriculum? You should be able to confidently check off the following competencies:

I can draw and explain the pipeline from raw text to numerical vector.
I can contrast traditional AI (classification/prediction) with Generative AI (creation/generation).
I can explain the role of a "Transformer" and its "Self-Attention" mechanism without using overly technical jargon.
I can distinguish between the use case for a Transformer-based LLM (e.g., text summarization) and a Diffusion model (e.g., image generation).
I can successfully modify a prompt using techniques like "few-shot prompting" to significantly improve an AI's response quality.

[!WARNING] Common Pitfall: A major misconception is treating FMs as factual databases. Success requires understanding that these models are probabilistically generating text, highlighting the importance of metrics and guardrails against hallucinations.

Real-World Application

Understanding these foundations is critical for architecting modern business solutions. Generative AI is no longer a theoretical novelty; it is actively transforming industries.

Customer Service Agents: Leveraging chunking and vector embeddings, companies build Retrieval-Augmented Generation (RAG) systems that search internal documentation and generate highly accurate, conversational responses using Amazon Bedrock.
Marketing & Content Creation: Marketers use prompt engineering on Multimodal models to auto-generate customized ad copy and corresponding image assets simultaneously, vastly reducing creative lead time.
Software Development: Transformer-based LLMs ingest millions of lines of code to act as AI programming assistants (like Amazon Q), understanding the context of an entire software repository to suggest complex bug fixes.

By laying this groundwork, you are preparing to evaluate, deploy, and securely manage Generative AI pipelines on enterprise cloud platforms like AWS.