How Large Language Models like ChatGPT actually work

No maths, just understanding. A clear explanation of tokens, predictions and hallucinations - so you know when to trust an LLM.

Anyone using ChatGPT seriously will one day hear someone say: “It makes things up.” True. But why does it invent these specific things, and not others? To grasp that, you don’t need to be a mathematician. You need to remember one principle.

An LLM predicts the next word. That is all.

Tokens, not words

An LLM does not see text the way you do. It sees tokens - chunks of text from a few letters to a whole word. “AI literacy” may split into three tokens. “What is” is two. The model takes your prompt as a sequence of tokens and picks the most likely next token, then the next, and so on.

How did it learn “likely”?

The model was trained on huge volumes of text from the web, books and code. During training, a chunk of text was hidden and the model had to guess what came next. Billions of times. From that it learned patterns - not facts, but relationships between words and pieces of information.

This explains why an LLM sounds so fluent: fluency is exactly what it was trained for. It also explains why factual accuracy is a by-product, not the core.

Why do they hallucinate?

Imagine you ask for the author of an obscure article. The model has no “I don’t know” button. It simply picks the most likely next word. The result sounds plausible - a credible name, a fitting title - but the combination may be entirely invented. That is a hallucination: fluent, convincing, wrong text.

Hallucinations are more common for: specific facts (dates, quotes, names), recent events, niche topics, and open questions for which the model has little training data.

How is it different from search?

Google looks up existing pages. An LLM generates text from patterns. Since 2024 many models also have a browse function or “retrieval-augmented generation” (RAG), pulling in external sources and answering from those. That lowers the hallucination risk - but never to zero.

Context window: the short-term memory

An LLM only “remembers” what is in the current conversation plus your instructions. That is the context window. Modern models reach 100,000 to 1,000,000 tokens - enough for a book. But the moment you open a new chat, it is all gone.

Practical consequences for your work

1Never ask for pure facts without a source. Ask for the source too, and verify it.
2Provide context in the prompt - the model has no access to your files unless you paste them.
3Use LLMs for rewriting, summarising, brainstorming and structuring; they excel there.
4Don’t trust numbers, quotes or legal articles blindly, even when they sound certain.
5Reuse a prompt that works - it’s faster than starting over.

Which model when?

Roughly three tiers exist: small fast models (good for classification and short tasks), mid-size (productivity), and frontier models (complex reasoning and long documents). For most office work, the mid-tier is enough. For legal analysis or code review, you want the frontier tier.

Those who grasp these principles write better prompts within a week than someone collecting tricks for months. Our course lets you practise this with concrete real-world assignments.

How Large Language Models like ChatGPT actually work

Tokens, not words

How did it learn “likely”?

Why do they hallucinate?

How is it different from search?

Context window: the short-term memory

Practical consequences for your work

Which model when?

Related articles

AI in the workplace: 10 practical applications that work today

Machine learning for beginners: a practical introduction

AI ethics: recognising and preventing bias

Want to learn more about AI?