What are the main components of an AI agent?

Every AI agent has four core components: (1) Memory — what the agent knows and can recall, (2) Tools — external capabilities the agent can invoke, (3) Planning — how the agent decides what to do next, and (4) Perception — the inputs the agent can observe.

What types of memory does an AI agent have?

Agents typically have four memory types: in-context memory (the current conversation window), external memory (a vector database or file system), episodic memory (logs of past runs), and semantic memory (pre-trained world knowledge built into the model itself).

How do tools make agents more powerful?

Tools give agents abilities the underlying language model doesn't have on its own — searching the web, running code, reading files, querying databases, or calling APIs. A tool is essentially a function the agent can call by name, with arguments, and receive a result from.

What is the difference between an agent and a chain?

A chain is a fixed sequence of steps — A then B then C, no branching. An agent dynamically decides which step to take next based on what it has observed so far. This makes agents more flexible but also harder to predict and debug.

Agent Components: Memory, Tools, Planning, and Perception

The Four Building Blocks

Every AI agent — from a simple web-search bot to a complex coding assistant — is assembled from the same four components. The frameworks differ (LangChain, CrewAI, AutoGen, custom code), but the underlying architecture is always variations of:

Memory — What the agent knows and can remember
Tools — What the agent can do
Planning — How the agent decides what to do next
Perception — What inputs the agent can observe

Understanding these components lets you reason clearly about any agent system you encounter or build.

Component 1: Memory

Memory is how an agent maintains state — both within a single run and across multiple runs.

In-Context Memory (Working Memory)

This is the conversation history and tool results currently sitting inside the model's context window. It's the most immediate form of memory — everything the agent has seen in this session.

Limitation: context windows have finite sizes. A long-running agent will eventually fill its context, causing it to "forget" earlier information unless you manage this carefully.

[System prompt]
[Tool result from turn 1]
[Reasoning from turn 2]
[Tool result from turn 2]
[Current task...]
← all of this lives in working memory

External Memory (Long-Term Memory)

Information stored outside the model — in a vector database, file system, or key-value store. The agent retrieves relevant chunks when needed.

Example: A customer support agent might store all past tickets in a vector database. When a new ticket arrives, the agent retrieves the 5 most similar past tickets and uses them to craft a response.

Episodic Memory

Logs or summaries of past agent runs, stored externally and retrieved in future sessions. This lets an agent "remember" that it tried approach X last week and it failed.

Semantic Memory

The world knowledge baked into the model during pretraining. The agent "knows" things without being explicitly told — like what Python is, how photosynthesis works, or what JSON looks like.

Component 2: Tools

Tools are what transform an LLM from a text generator into an agent that can affect the world.

A tool is simply a function the model can call by name, passing arguments, and receiving a structured result back.

Common Tool Types

Tool Type	Example	What it enables
Web search	`search("quantum computing 2026")`	Real-time information
Code execution	`run_python("import pandas...")`	Computation, data analysis
File I/O	`read_file("report.pdf")`	Process documents
API calls	`get_weather("London")`	External data sources
Database query	`sql_query("SELECT...")`	Structured data retrieval
Browser control	`click(selector)`	Web automation
Agent spawning	`delegate_to(sub_agent, task)`	Multi-agent coordination

How Tool Calling Works

Modern LLMs support "function calling" or "tool use" — a structured way to define tools and have the model invoke them. You describe each tool with:

A name (e.g., search_web)
A description (what it does — this is what the model reads)
Parameters with types and descriptions

The model decides when and how to call the tool. You write the actual function that runs when it does.

{
  "name": "search_web",
  "description": "Search the internet for current information. Use this when you need recent data or facts not in your training.",
  "parameters": {
    "query": {
      "type": "string",
      "description": "The search query to use"
    }
  }
}

Pro tip: The tool description is a prompt. Write it clearly, including when the model should use the tool. A vague description leads to the model calling it at the wrong time.

Component 3: Planning

Planning is the reasoning mechanism that decides what action to take next.

Reactive Planning

The simplest form — the agent just picks the best immediate action given its current observation. No multi-step lookahead. Fast but can get stuck.

Chain-of-Thought Planning

The agent thinks through its plan before acting. "To answer this, I need X. To get X, I should search for Y. Let me do that first."

Thought: The user wants to know which city has the highest population in Europe.
Thought: I don't know the current figure — this might have changed recently.
Thought: I'll search for it.
Action: search("most populous city in Europe 2026")

ReAct (Reason + Act)

A formalized loop: Reason → Act → Observe → Reason again. Each turn the agent writes out its thinking before choosing an action. This dramatically improves reliability on complex tasks. We'll cover this in depth in lesson 4.

Plan-and-Execute

The agent creates a full plan upfront, then executes each step. Better for predictable tasks; less adaptive to unexpected results mid-task.

Component 4: Perception

Perception is everything the agent can observe — the inputs it uses to reason.

An agent might perceive:

Text: the user's request, tool results, retrieved documents
Structured data: JSON, CSV, database query results
Images: screenshots, diagrams, photos (in multimodal agents)
Code: the current state of a codebase
Agent messages: outputs from other agents in a multi-agent system

The key design question: what should the agent see, and when?

Too much context overwhelms the model and wastes tokens. Too little leaves it guessing. Good agent design carefully controls what enters the context window at each step.

How the Components Interact

Here's a single agent turn showing all four components working together:

Perception:   Agent observes [task from user] + [result from last tool call]
              ↓
Planning:     Agent reasons: "I have the data I need. Now I'll summarize it."
              ↓
Tool use:     Agent calls: write_summary(data=..., format="bullet points")
              ↓
Memory:       Summary result added to in-context memory for next turn

This cycle repeats until the agent's planning mechanism decides the task is done.

Designing Components Well

Memory design

Keep in-context memory focused — summarize or compress old turns rather than letting them pile up
Use external memory for anything that won't fit in context or needs to persist across runs
Consider what the agent needs to remember vs. what it's tempted to over-retain

Tool design

Write tool descriptions as if you're instructing a smart person who has never seen the tool
Include negative examples: "Use this for X, NOT for Y"
Keep tools focused — a tool that does one thing well is better than a multi-purpose tool that confuses the model

Planning design

Use chain-of-thought or ReAct for complex tasks — explicit reasoning improves reliability
Add a "stop condition" to your system prompt: tell the agent explicitly when it should stop and return an answer rather than continuing to act

Perception design

Be selective about what goes into context — relevance over completeness
Structure tool results clearly (JSON with descriptive keys, not raw blobs)
If images are involved, caption them or describe what the agent should focus on

Key Takeaways

Every agent is built from four components: Memory, Tools, Planning, Perception
Memory comes in four types: in-context (working), external (long-term), episodic (past runs), semantic (pretrained knowledge)
Tools are functions the LLM can call — and their descriptions are prompts that determine how and when the model uses them
Planning ranges from simple reactive decisions to explicit multi-step reasoning (ReAct, plan-and-execute)
Perception design determines what the agent sees — be selective, not exhaustive
Next lesson: function calling — the technical mechanism that makes tools work