Skip to content
AEO Canon · the reference for answer-engine optimization

Lesson 2 of 6

How LLMs Work

Tokens become embeddings, attention weighs context, and the model predicts the next token. This lesson walks the loop from prompt to answer.

Learning objectives

  • Trace the steps from prompt to generated answer.
  • Explain what attention does.
  • Explain why a base model can't cite sources.

The lesson

Read the full lesson →How Do Large Language Models Work?LLMs work by breaking text into tokens, converting them to embeddings, using a transformer's attention mechanism to weigh context, and predicting the next token one at a time — repeated to generate full answers.3 min read

Key takeaways

  • Text → tokens → embeddings → transformer attention → next-token prediction, repeated.
  • Attention lets the model weigh how tokens relate — it's the transformer breakthrough.
  • A base model answers from frozen patterns; it can't look things up unless connected to retrieval.

Knowledge check

Knowledge check

0 / 3

  1. 1. What is the transformer's key operation?

  2. 2. What does a token become before the model processes it?

  3. 3. Why does a plain model give different answers to the same prompt?