The first time I built an AI agent, I expected it to be much harder than it was. All the frameworks and jargon make it sound complicated. But the core loop — model thinks, model calls a tool, tool returns a result, model thinks again — is actually simple once you strip away the abstraction.
This tutorial builds a minimal but working agent from scratch. By the end you'll have something that can search for information, reason about what it finds, and give you a grounded answer.
What We're Building
A research agent that:
- Accepts a question
- Searches for relevant information (we'll simulate this)
- Decides whether it has enough information or needs to search more
- Returns a final answer based on what it found
This covers the full agent loop in the simplest possible form.
Prerequisites
- Python 3.9+
pip install anthropic- An Anthropic API key
Step 1: The Core Loop
An agent isn't a single API call — it's a loop. The model runs, may request a tool, you execute the tool, you run the model again with the result. Repeat until the model says it's done.
import anthropic
import json
client = anthropic.Anthropic()
def run_agent(initial_question: str) -> str:
"""Run the agent loop until it produces a final answer."""
messages = [
{"role": "user", "content": initial_question}
]
max_steps = 5 # Prevent infinite loops
step = 0
while step < max_steps:
step += 1
# Ask the model what to do
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system="You are a research assistant. Use the search tool to find information before answering. Always search before giving a final answer on factual topics.",
tools=TOOLS,
messages=messages
)
# Check if the model is done
if response.stop_reason == "end_turn":
# Extract and return the text response
for block in response.content:
if hasattr(block, "text"):
return block.text
return "No response generated"
# Model wants to use a tool
messages.append({"role": "assistant", "content": response.content})
# Execute tool calls and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" → Agent is using tool: {block.name}({block.input})")
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
return "Agent reached maximum steps without completing the task."
Step 2: Define the Tools
Tools are what make an agent more than a chatbot. Here we'll use a simulated search tool (replace with a real API for production):
TOOLS = [
{
"name": "search",
"description": "Search for information on a topic. Returns a summary of relevant results. Use this to find current facts, statistics, or information you don't know.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression. Use this for any arithmetic rather than calculating in your head.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A valid Python mathematical expression, e.g. '(150 * 0.15) + 50'"
}
},
"required": ["expression"]
}
}
]
def execute_tool(tool_name: str, tool_input: dict) -> str:
"""Execute a tool and return the result as a string."""
if tool_name == "search":
return simulate_search(tool_input["query"])
elif tool_name == "calculate":
try:
# Safe evaluation of math expressions only
result = eval(tool_input["expression"], {"__builtins__": {}},
{"abs": abs, "round": round, "min": min, "max": max})
return str(result)
except Exception as e:
return f"Error: {e}"
return f"Unknown tool: {tool_name}"
def simulate_search(query: str) -> str:
"""Simulate a search engine. Replace with real API in production."""
# In a real agent, you'd call DuckDuckGo, Tavily, Serper, or similar
results = {
"population london": "London's population is approximately 9.7 million in Greater London (2024 estimate). The city is the UK's largest urban area.",
"population paris": "Paris has approximately 2.1 million people in the city proper and 12.3 million in the greater metropolitan area (2024).",
"gdp france": "France's GDP was approximately $3.1 trillion USD in 2024, making it the 7th largest economy globally.",
}
query_lower = query.lower()
for key, value in results.items():
if any(word in query_lower for word in key.split()):
return value
return f"Search results for '{query}': Found general information but no specific data in the knowledge base."
Step 3: Run It
if __name__ == "__main__":
questions = [
"What is the population of London?",
"Which is larger — Paris or London? By what percentage?",
]
for question in questions:
print(f"\nQuestion: {question}")
print("Agent working...")
answer = run_agent(question)
print(f"Answer: {answer}")
Output:
Question: Which is larger — Paris or London? By what percentage?
Agent working...
→ Agent is using tool: search({'query': 'population London'})
→ Agent is using tool: search({'query': 'population Paris'})
→ Agent is using tool: calculate({'expression': '(9.7 - 2.1) / 2.1 * 100'})
Answer: London is significantly larger than Paris. London has approximately 9.7 million
people in Greater London, while Paris has about 2.1 million in its city proper. That
makes London roughly 362% larger than Paris in terms of city population.
Note: if comparing metropolitan areas, Paris has about 12.3 million vs London's 9.7 million,
which would make Paris's metro area about 27% larger than London's.
The agent searched twice (once per city), ran a calculation, and synthesized a nuanced answer that accounted for the ambiguity in "population of Paris."
Step 4: What to Improve Next
This is a minimal agent. Real agents need:
Better search — replace simulate_search with a real API:
- Tavily — designed for LLM agents, clean API
- Serper — Google search results via API
- DuckDuckGo — free, no key needed for basic use
Memory — the agent above has no memory between conversations. Add a memory store (even just a file) to persist what it's learned.
Error handling — what happens when search fails? When the model exceeds max_steps? Add graceful degradation.
Streaming — for longer agent runs, stream the model's responses so users see progress rather than waiting.
Logging — log every tool call and model response for debugging. Agent issues are notoriously hard to debug without complete logs.
The Pattern You Just Learned
Strip everything away and the agent pattern is:
while not done:
response = model.generate(messages, tools)
if response.is_final_answer:
return response.text
tool_results = execute(response.tool_calls)
messages.append(tool_calls)
messages.append(tool_results)
That's it. Frameworks like LangGraph, CrewAI, and AutoGen add structure for complex multi-agent systems, persistent state, and parallel execution — but they're all implementations of this same basic loop.
If you want to go deeper on the reasoning patterns that make agents work — ReAct loops, planning strategies, how to structure context for agents — the AI Agents track on MasterPrompting.net covers all of it with dedicated lessons on each concept.



