The Four Building Blocks
Every AI agent — from a simple web-search bot to a complex coding assistant — is assembled from the same four components. The frameworks differ (LangChain, CrewAI, AutoGen, custom code), but the underlying architecture is always variations of:
- Memory — What the agent knows and can remember
- Tools — What the agent can do
- Planning — How the agent decides what to do next
- Perception — What inputs the agent can observe
Understanding these components lets you reason clearly about any agent system you encounter or build.
Component 1: Memory
Memory is how an agent maintains state — both within a single run and across multiple runs.
In-Context Memory (Working Memory)
This is the conversation history and tool results currently sitting inside the model's context window. It's the most immediate form of memory — everything the agent has seen in this session.
Limitation: context windows have finite sizes. A long-running agent will eventually fill its context, causing it to "forget" earlier information unless you manage this carefully.
[System prompt]
[Tool result from turn 1]
[Reasoning from turn 2]
[Tool result from turn 2]
[Current task...]
← all of this lives in working memory
External Memory (Long-Term Memory)
Information stored outside the model — in a vector database, file system, or key-value store. The agent retrieves relevant chunks when needed.
Example: A customer support agent might store all past tickets in a vector database. When a new ticket arrives, the agent retrieves the 5 most similar past tickets and uses them to craft a response.
Episodic Memory
Logs or summaries of past agent runs, stored externally and retrieved in future sessions. This lets an agent "remember" that it tried approach X last week and it failed.
Semantic Memory
The world knowledge baked into the model during pretraining. The agent "knows" things without being explicitly told — like what Python is, how photosynthesis works, or what JSON looks like.
Component 2: Tools
Tools are what transform an LLM from a text generator into an agent that can affect the world.
A tool is simply a function the model can call by name, passing arguments, and receiving a structured result back.
Common Tool Types
| Tool Type | Example | What it enables |
|---|---|---|
| Web search | search("quantum computing 2026") | Real-time information |
| Code execution | run_python("import pandas...") | Computation, data analysis |
| File I/O | read_file("report.pdf") | Process documents |
| API calls | get_weather("London") | External data sources |
| Database query | sql_query("SELECT...") | Structured data retrieval |
| Browser control | click(selector) | Web automation |
| Agent spawning | delegate_to(sub_agent, task) | Multi-agent coordination |
How Tool Calling Works
Modern LLMs support "function calling" or "tool use" — a structured way to define tools and have the model invoke them. You describe each tool with:
- A name (e.g.,
search_web) - A description (what it does — this is what the model reads)
- Parameters with types and descriptions
The model decides when and how to call the tool. You write the actual function that runs when it does.
{
"name": "search_web",
"description": "Search the internet for current information. Use this when you need recent data or facts not in your training.",
"parameters": {
"query": {
"type": "string",
"description": "The search query to use"
}
}
}
Pro tip: The tool description is a prompt. Write it clearly, including when the model should use the tool. A vague description leads to the model calling it at the wrong time.
Component 3: Planning
Planning is the reasoning mechanism that decides what action to take next.
Reactive Planning
The simplest form — the agent just picks the best immediate action given its current observation. No multi-step lookahead. Fast but can get stuck.
Chain-of-Thought Planning
The agent thinks through its plan before acting. "To answer this, I need X. To get X, I should search for Y. Let me do that first."
Thought: The user wants to know which city has the highest population in Europe.
Thought: I don't know the current figure — this might have changed recently.
Thought: I'll search for it.
Action: search("most populous city in Europe 2026")
ReAct (Reason + Act)
A formalized loop: Reason → Act → Observe → Reason again. Each turn the agent writes out its thinking before choosing an action. This dramatically improves reliability on complex tasks. We'll cover this in depth in lesson 4.
Plan-and-Execute
The agent creates a full plan upfront, then executes each step. Better for predictable tasks; less adaptive to unexpected results mid-task.
Component 4: Perception
Perception is everything the agent can observe — the inputs it uses to reason.
An agent might perceive:
- Text: the user's request, tool results, retrieved documents
- Structured data: JSON, CSV, database query results
- Images: screenshots, diagrams, photos (in multimodal agents)
- Code: the current state of a codebase
- Agent messages: outputs from other agents in a multi-agent system
The key design question: what should the agent see, and when?
Too much context overwhelms the model and wastes tokens. Too little leaves it guessing. Good agent design carefully controls what enters the context window at each step.
How the Components Interact
Here's a single agent turn showing all four components working together:
Perception: Agent observes [task from user] + [result from last tool call]
↓
Planning: Agent reasons: "I have the data I need. Now I'll summarize it."
↓
Tool use: Agent calls: write_summary(data=..., format="bullet points")
↓
Memory: Summary result added to in-context memory for next turn
This cycle repeats until the agent's planning mechanism decides the task is done.
Designing Components Well
Memory design
- Keep in-context memory focused — summarize or compress old turns rather than letting them pile up
- Use external memory for anything that won't fit in context or needs to persist across runs
- Consider what the agent needs to remember vs. what it's tempted to over-retain
Tool design
- Write tool descriptions as if you're instructing a smart person who has never seen the tool
- Include negative examples: "Use this for X, NOT for Y"
- Keep tools focused — a tool that does one thing well is better than a multi-purpose tool that confuses the model
Planning design
- Use chain-of-thought or ReAct for complex tasks — explicit reasoning improves reliability
- Add a "stop condition" to your system prompt: tell the agent explicitly when it should stop and return an answer rather than continuing to act
Perception design
- Be selective about what goes into context — relevance over completeness
- Structure tool results clearly (JSON with descriptive keys, not raw blobs)
- If images are involved, caption them or describe what the agent should focus on
Key Takeaways
- Every agent is built from four components: Memory, Tools, Planning, Perception
- Memory comes in four types: in-context (working), external (long-term), episodic (past runs), semantic (pretrained knowledge)
- Tools are functions the LLM can call — and their descriptions are prompts that determine how and when the model uses them
- Planning ranges from simple reactive decisions to explicit multi-step reasoning (ReAct, plan-and-execute)
- Perception design determines what the agent sees — be selective, not exhaustive
- Next lesson: function calling — the technical mechanism that makes tools work