MasterPrompting
🌱 BeginnerbeginnerfundamentalsLLMs

How LLMs Work: What Every Prompter Should Know

A practical, non-technical explanation of how large language models work — and why this understanding makes you a dramatically better prompt engineer.

5 min read

You don't need to understand the math behind large language models to use them well. But understanding the basics of how they work will make you a noticeably better prompt engineer. This guide covers everything you need to know — no technical background required.


The Core Mechanism: Next Token Prediction

Large language models do one thing: predict the next token.

A token is roughly a word or word-fragment (the word "prompting" might be one token; "unbelievable" might be two). Given a sequence of tokens, the model calculates the probability of every possible next token and picks one.

Then it does it again. And again. Building your response one token at a time.

That's it. There's no understanding, no reasoning, no "thinking" in the human sense. Just extremely sophisticated pattern prediction trained on a massive amount of text.


Why This Matters for Prompting

This mechanism explains several behaviors you've probably noticed:

Why vague prompts give generic results. When your prompt is vague, the most statistically "likely" continuation is generic — because most text in training data is generic. Specific prompts push the model toward less common, more targeted patterns.

Why context helps so much. Every token in your prompt is context that shapes the next prediction. More relevant context → better-targeted predictions → better outputs.

Why position matters. Research shows models pay more attention to the beginning and end of a prompt than the middle. Put your most important instructions at the top and/or repeat key constraints at the bottom.

Why the model can "hallucinate". The model predicts plausible-sounding text, not necessarily true text. If your prompt has a gap in context, the model fills it with what sounds right — which may be fabricated.

Why it's not "looking things up". LLMs don't retrieve information from a database at inference time. They compress patterns from training data into weights. This is why they can be confidently wrong.


The Context Window

Every LLM has a context window — the maximum number of tokens it can "see" at once when generating a response. This includes your prompt, any prior conversation, and the response it's generating.

| Model | Context Window | |-------|---------------| | GPT-4o | 128K tokens | | Claude 3.5 Sonnet | 200K tokens | | Gemini 1.5 Pro | 1M tokens |

Practical implications:

  • Keep essential instructions in the prompt — don't bury them deep in a long document
  • In long conversations, early messages may "fall out" of context
  • Longer context = more expensive API calls (you pay per token)

Temperature: Controlling Randomness

When an LLM picks the next token, it doesn't always pick the highest-probability one. There's a parameter called temperature that controls how much randomness is introduced.

  • Low temperature (0–0.3): More deterministic. The model almost always picks the most likely token. Great for factual tasks, code, JSON output.
  • Medium temperature (0.5–0.7): Balanced. Good for general writing.
  • High temperature (0.8–1.0+): More creative and unpredictable. The model explores less likely tokens. Great for brainstorming, creative writing.

If you're using the API, you control temperature directly. If you're using ChatGPT/Claude/Gemini chat interfaces, they pick a default (usually around 0.7).

In your prompts: You can mimic low-temperature behavior by adding explicit constraints. "Be precise and consistent" pulls the model toward deterministic patterns.


System Prompts vs User Prompts

In most LLM APIs, there are two main input types:

System prompt: Instructions that define the AI's overall behavior, persona, and constraints. Set once, applies to the whole conversation.

User prompt: The actual request or question in each turn of the conversation.

In the chat interfaces (ChatGPT, Claude.ai), the first message you send effectively acts as the user prompt. Some interfaces let you set a custom system prompt in settings.

When prompting via API or building products: always put persistent instructions (role, constraints, output format) in the system prompt and task-specific instructions in the user prompt.


What LLMs Are Good and Bad At

Good at:

  • Transforming text (summarizing, rewriting, translating, formatting)
  • Generating plausible, fluent text in any style
  • Pattern matching and classification
  • Reasoning through problems step by step (with the right prompts)
  • Writing code based on natural language descriptions

Bad at:

  • Precise arithmetic (they predict digits, not calculate)
  • Knowing recent events (knowledge cutoff)
  • Citing specific sources reliably (they paraphrase, not retrieve)
  • Tasks requiring exact, consistent memory across long sessions
  • Anything that requires genuinely novel reasoning (not seen in training)

Key Takeaway

LLMs predict text — they don't think. Understanding this shifts how you prompt: you're not asking a person to help you, you're guiding a powerful pattern-matching system toward the output you want. Give it context, structure, examples, and constraints — and it performs remarkably well. Leave gaps, and it fills them with plausible-sounding guesses.

This concludes the Beginner Track. You now understand: what a prompt is, how to be specific, how to assign roles, how to format output, and how the underlying model works. You're ready for the Intermediate Track.