Lesson 2 of 6
How LLMs Work
Tokens become embeddings, attention weighs context, and the model predicts the next token. This lesson walks the loop from prompt to answer.
Learning objectives
- ▸Trace the steps from prompt to generated answer.
- ▸Explain what attention does.
- ▸Explain why a base model can't cite sources.
The lesson
Read the full lesson →How Do Large Language Models Work?LLMs work by breaking text into tokens, converting them to embeddings, using a transformer's attention mechanism to weigh context, and predicting the next token one at a time — repeated to generate full answers.3 min readKey takeaways
- ▸Text → tokens → embeddings → transformer attention → next-token prediction, repeated.
- ▸Attention lets the model weigh how tokens relate — it's the transformer breakthrough.
- ▸A base model answers from frozen patterns; it can't look things up unless connected to retrieval.
Knowledge check
Knowledge check
0 / 3
1. What is the transformer's key operation?
2. What does a token become before the model processes it?
3. Why does a plain model give different answers to the same prompt?