An AI agent isn't a chatbot with extra steps. A chatbot responds to a message. An agent reasons about a goal, decides what actions to take, executes those actions using tools, observes the results, and repeats until the job is done. The difference is autonomy over a sequence of steps — not just generating text.
Claude is particularly well-suited for agent work. It follows instructions reliably, handles complex tool schemas without hallucinating calls, and its long context window means it can maintain state across many steps without losing track of what's happened.
This tutorial builds a real agent from scratch: one that can search for information, process the results, and return a structured answer. You'll have working code by the end.
What you need
- A Claude API key (get one at console.anthropic.com)
- Python 3.9+ with the
anthropicpackage installed (pip install anthropic) - Basic Python comfort — you don't need to understand transformers
How Claude tool use works
Before writing code, understand the loop. When you give Claude tools, every conversation follows this pattern:
- You send a message with a list of available tools (name, description, input schema)
- Claude responds — either with a text answer, or with a
tool_useblock requesting a specific tool call - You execute the tool and send back the result in a
tool_resultblock - Claude continues — it reads the result and either calls another tool or gives a final answer
- Repeat until Claude returns a final text response
This loop is what makes agents different from single-shot completions. Claude is directing its own workflow, not just responding to yours.
Step 1: Define your tools
Tools are just JSON schemas. You describe what a function does and what parameters it takes, and Claude decides when to call it.
tools = [
{
"name": "web_search",
"description": "Search the web for current information. Use this when you need facts, recent events, or specific data you don't know.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query. Be specific — use keywords, not natural language questions."
}
},
"required": ["query"]
}
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression and return the result. Use for arithmetic, unit conversions, or any calculation.",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A valid Python math expression, e.g. '150 * 0.07' or '(42 + 18) / 3'"
}
},
"required": ["expression"]
}
}
]
Tool description quality matters a lot. Claude decides which tool to call (and whether to call one at all) based entirely on the description. Be specific about when to use each tool, not just what it does. A vague description produces unreliable tool selection.
Step 2: Implement the tool functions
These are regular Python functions. Claude doesn't execute them — you do. Claude just tells you which one to call and with what arguments.
import anthropic
import json
def web_search(query: str) -> str:
# In a real agent, wire this to SerpAPI, Brave Search, Tavily, etc.
# For this tutorial, we mock it.
mock_results = {
"latest Claude model": "Claude Sonnet 4.6 was released in February 2026 with a 1M token context window.",
"Python version": "Python 3.13 is the latest stable version as of early 2026.",
}
for key, value in mock_results.items():
if key.lower() in query.lower():
return value
return f"Search results for '{query}': No specific results found in mock database."
def calculate(expression: str) -> str:
try:
result = eval(expression, {"__builtins__": {}}, {})
return str(result)
except Exception as e:
return f"Error evaluating expression: {e}"
def execute_tool(tool_name: str, tool_input: dict) -> str:
if tool_name == "web_search":
return web_search(tool_input["query"])
elif tool_name == "calculate":
return calculate(tool_input["expression"])
else:
return f"Unknown tool: {tool_name}"
Step 3: Build the agent loop
This is the core of the agent. It handles the back-and-forth between Claude and your tools until Claude signals it's done.
def run_agent(user_message: str, system_prompt: str = None) -> str:
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]
if not system_prompt:
system_prompt = """You are a helpful research assistant.
You have access to web search and calculation tools.
Always use tools to verify facts rather than relying on your training data for current information.
When you have enough information to answer the question, stop calling tools and give a clear, direct answer."""
print(f"\nUser: {user_message}\n")
# Agent loop — runs until Claude stops requesting tools
while True:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
system=system_prompt,
tools=tools,
messages=messages,
)
# Add Claude's response to message history
messages.append({"role": "assistant", "content": response.content})
# Check if Claude is done (no tool calls)
if response.stop_reason == "end_turn":
# Extract and return the final text response
for block in response.content:
if hasattr(block, "text"):
return block.text
# Process tool calls
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f" → Calling tool: {block.name}({block.input})")
result = execute_tool(block.name, block.input)
print(f" ← Result: {result}\n")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Add tool results to message history and continue the loop
messages.append({"role": "user", "content": tool_results})
Step 4: Run it
if __name__ == "__main__":
answer = run_agent(
"What's the latest Claude model, and if it costs $3 per million input tokens, "
"how much would 50 million tokens cost?"
)
print(f"\nFinal answer:\n{answer}")
Output:
User: What's the latest Claude model, and if it costs $3 per million input tokens, how much would 50 million tokens cost?
→ Calling tool: web_search({'query': 'latest Claude model 2026'})
← Result: Claude Sonnet 4.6 was released in February 2026 with a 1M token context window.
→ Calling tool: calculate({'expression': '50 * 3'})
← Result: 150
Final answer:
The latest Claude model is Claude Sonnet 4.6, released in February 2026. At $3 per million input tokens, 50 million tokens would cost $150.
Claude searched for the model, ran the calculation, and combined the results — all without you orchestrating which tools to call or in what order.
Making it production-ready
The tutorial above works. Here's what you'd add for anything beyond a prototype:
Error handling in the loop
# Add a max_iterations guard to prevent infinite loops
max_iterations = 10
iteration = 0
while iteration < max_iterations:
iteration += 1
# ... rest of loop
if iteration >= max_iterations:
return "Agent reached maximum iterations without completing the task."
Persistent memory For a real agent, you need memory that persists across conversations. Options:
- Simple: store message history in a database, load it at the start of each conversation
- Advanced: use a vector database to retrieve relevant past context (this is agentic RAG)
Real tool implementations
Replace the mock web_search with an actual search API. Tavily and Brave Search both have Python SDKs and are commonly used in agent setups. Tavily is particularly popular because it returns clean, structured results that Claude can reason about easily.
Logging and observability Log every tool call, every response, and every tool result. When agents fail in production, the logs are your only way to understand what happened. Tools like LangSmith and Braintrust are designed specifically for this.
System prompt hardening Add explicit failure handling to your system prompt:
If a tool returns an error, try a different approach — don't just report the error.
If you've tried three times and still can't complete the task, explain what you were unable to do and why.
Never make up information to fill gaps — use tools or say you don't know.
What to build next
Once you have the basic loop working, the interesting problems are:
- Multiple agents — one agent decomposes a task, others execute subtasks in parallel. The multi-agent systems lesson covers the patterns.
- Structured output — make Claude return JSON instead of text so your application can parse the result programmatically
- MCP tools — instead of defining tool schemas manually, connect Claude to MCP servers that already expose tools for Notion, GitHub, Postgres, etc. See the MCP protocol guide.
- Evaluation — before deploying, build a test suite of known inputs and expected outputs. The evaluating agents lesson covers how.
The agent loop itself is simple. Everything interesting happens in the quality of your tools, the robustness of your error handling, and the precision of your system prompt. Start small, log everything, and iterate.
The full code for this tutorial is in the coding section of the prompt library — with a copy button.



