Major LLM training has three stages: pre-training (learning language patterns from massive text), SFT supervised fine-tuning (learning how to answer questions), and RLHF human preference alignment (learning what makes a “good” answer). Understanding these stages explains why AI has different preferences for different content types.
Three Stages
Stage 1: Pre-training — “Reading Everything”
The model trains on trillions of tokens, learning language fundamentals: grammar, semantics, facts, reasoning. GEO meaning: This determines parametric memory. If your brand appears frequently in authoritative pre-training data, the model “knows” you. → Chapter 3 of Get AI to Speak for You: The Definitive Guide to GEO
Stage 2: SFT — “Learning to Answer”
Training on curated question-answer pairs teaches the model conversational response. GEO meaning: SFT data typically follows “definition → explanation → example → summary” structure. Content matching this structure faces less “resistance” when cited. → Strategy 05
Stage 3: RLHF — “Learning What’s Good”
Human annotators rank the model’s multiple answers by preference. The model learns to prefer: helpful (direct answers), harmless (no misinformation), honest (admitting uncertainty). GEO meaning: RLHF-trained models systematically prefer “objective, direct, data-backed” content and reject “vague, exaggerated, unsourced” content. → Strategy 25
Why Marketing Copy Increasingly Fails
All three training layers compound: pre-training favors authoritative sources, SFT favors structured answers, RLHF rejects exaggeration and vagueness. Marketing copy loses at every layer. This isn’t deliberate product design — it’s the natural result of the training process.
Further Reading
- Get AI to Speak for You: The Definitive Guide to GEO, Chapter 2, Section 2.5; Strategies 05/25
