AEO Canon · the reference for answer-engine optimizationGet found by the AI your customers ask

Done-for-you Book a call

Lesson 2 of 6

How LLMs Work

Tokens become embeddings, attention weighs context, and the model predicts the next token. This lesson walks the loop from prompt to answer.

Learning objectives

▸Trace the steps from prompt to generated answer.
▸Explain what attention does.
▸Explain why a base model can't cite sources.

The lesson

Read the full lesson →How Do Large Language Models Work?LLMs work by breaking text into tokens, converting them to embeddings, using a transformer's attention mechanism to weigh context, and predicting the next token one at a time — repeated to generate full answers.3 min read

Key takeaways

▸Text → tokens → embeddings → transformer attention → next-token prediction, repeated.
▸Attention lets the model weigh how tokens relate — it's the transformer breakthrough.
▸A base model answers from frozen patterns; it can't look things up unless connected to retrieval.

Knowledge check

0 / 3

1. What is the transformer's key operation?
2. What does a token become before the model processes it?
3. Why does a plain model give different answers to the same prompt?