I've reviewed more than 50 open-source agent repositories in the past year. The model choice is almost never the problem. The tool design almost always is.
When an agent behaves unexpectedly, developers reach for prompt engineering fixes — better system prompts, few-shot examples, chain-of-thought instructions. Sometimes that helps. But if the underlying tools are poorly designed, better prompting is just lipstick on a broken interface.
This post covers the specific mistakes I see most often and how to fix them.
Why tool design matters more than model choice
Here's the core insight: the model does exactly what the tool descriptions say, not what you intend. If your tool description is vague, the model will use it in ways you didn't anticipate. If your parameter names are ambiguous, the model will fill them with plausible but wrong values.
The model isn't reasoning about your business logic. It's pattern-matching on the words you put in the name and description fields. Those words are the actual interface.
This connects to a broader point in the agent components lesson — the tool layer is where agent architecture decisions have the most leverage.
Naming: use <verb>_<noun>
The tool name is the first thing the model uses to decide whether to call it. Make it unambiguous.
Bad names:
web # What does it do? Search? Fetch? Post?
customer # Get? Update? Delete? Create?
process_data # Process how? What data?
Good names:
search_web # clear action + target
get_customer_by_email # action + target + filter
classify_support_ticket # action + what's being classified
send_slack_message # action + channel
The verb matters. get_ means read. create_ means write. search_ means query with filtering. update_ means modify existing. delete_ means remove permanently — add this word only when you're sure you want the model calling it.
If a tool does more than one thing, split it. get_or_create_customer is an instruction to call two tools. Make two tools.
Descriptions: write them for a junior developer
The description field is what the model reads when deciding when and how to call a tool. It needs to answer three questions:
- What does this tool do?
- When should I call it (and when shouldn't I)?
- What will I get back?
Bad description:
{
"name": "get_user",
"description": "Gets user data."
}
The model has no idea what "user data" means or when to use this.
Good description:
{
"name": "get_user_profile",
"description": (
"Retrieve a user's profile including their name, email, subscription plan, "
"and last 5 order summaries. Use this when the customer mentions their email address "
"and you need account details to answer their question. "
"Returns null if no account exists for that email."
)
}
Three things the good description adds:
- What it returns: "profile including name, email, subscription plan, last 5 orders"
- When to use it: "when the customer mentions their email address"
- Edge case: "returns null if no account exists"
That last line prevents a common failure where the model tries to call downstream tools on a null response.
Parameters: keep them simple
Every parameter adds cognitive load for the model. The more complex your parameter schema, the more likely the model fills something in wrong.
Use enums for constrained values
If a parameter has a fixed set of valid values, enumerate them. The model will otherwise hallucinate plausible strings.
Bad:
{"name": "priority", "type": "string", "description": "The ticket priority level"}
The model might pass "high", "urgent", "P1", "critical", "HIGH" — all technically valid strings, all potentially breaking your downstream code.
Good:
{
"name": "priority",
"type": "string",
"enum": ["low", "medium", "high", "urgent"],
"description": "Priority level for the support ticket"
}
Avoid optional parameters when you can
Optional parameters confuse the model. Should I pass this? What happens if I don't? Unless you have a strong reason, make parameters required and handle defaults inside the tool function.
Bad:
{
"name": "search_products",
"parameters": {
"query": {"type": "string"},
"category": {"type": "string"}, # optional?
"min_price": {"type": "number"}, # optional?
"max_price": {"type": "number"}, # optional?
"sort_by": {"type": "string"}, # optional?
"in_stock_only": {"type": "boolean"} # optional?
}
}
The model gets overwhelmed. It either passes everything (with guessed values) or passes too little.
Better: narrow the interface to what the agent actually needs:
{
"name": "search_products",
"parameters": {
"query": {
"type": "string",
"description": "Search terms — product name, category, or description"
},
"max_results": {
"type": "integer",
"default": 5,
"description": "Number of results to return (1–20)"
}
},
"required": ["query"]
}
Filtering by price and category? Make that a separate filter_products tool if you actually need it.
Keep numbers in the right unit
If your tool takes an amount in paise (integers), say so explicitly. If it takes rupees, say that. I've seen payment agents charge ₹0.01 instead of ₹100 because the model didn't know the unit.
"amount": {
"type": "integer",
"description": "Payment amount in INR rupees (e.g. 2499 for ₹2,499). Do not use paise."
}
Return values: return only what the model needs
This is the mistake I see most in production agents: a tool makes an API call, gets back a 500-token JSON response, and passes the entire thing back to the model.
The model then has to parse that response, which it will do imperfectly. And those 500 tokens of irrelevant data are now part of the context for every subsequent call.
Extract the relevant fields in the tool function:
Bad:
def get_order(order_id: str) -> dict:
response = requests.get(f"/api/orders/{order_id}")
return response.json() # Returns 3KB of nested JSON
Good:
def get_order(order_id: str) -> dict:
data = requests.get(f"/api/orders/{order_id}").json()
return {
"order_id": data["id"],
"status": data["status"],
"item_count": len(data["line_items"]),
"total": data["total_amount"],
"estimated_delivery": data["shipping"]["estimated_date"],
"tracking_url": data["shipping"].get("tracking_url"),
}
Five fields instead of fifty. The model gets exactly what it needs to answer a shipping question. No irrelevant nested objects, no fields that don't exist in your context window.
Error handling: return structured errors, never raise
This is the mistake that causes the most agent loops in production.
Bad:
def get_customer(email: str) -> dict:
customer = db.find_by_email(email)
if not customer:
raise ValueError(f"No customer found with email {email}")
return customer
The model doesn't understand Python exceptions. When the tool raises, the framework catches it and usually returns a generic error string. The model sees "Error: No customer found with email x@y.com" and often just tries the same call again with a slightly different email.
Good:
def get_customer(email: str) -> dict:
customer = db.find_by_email(email)
if not customer:
return {
"found": False,
"error": f"No account found for {email}",
"suggestion": "Ask the customer to confirm their email, or search by phone number instead."
}
return {"found": True, "customer": {...}}
The model can reason about a structured error. It reads suggestion and does exactly what it says. The loop stops.
Always include a suggestion in error returns for tool failures the model can recover from.
Testing tools in isolation
Build and test each tool as a standalone function before wiring it to an agent. The agent is the hardest layer to debug — if the tool itself is broken, you'll spend hours confused.
# Test in isolation first
result = get_customer("test@example.com")
assert result["found"] == True
assert "email" in result["customer"]
result = get_customer("nonexistent@example.com")
assert result["found"] == False
assert "suggestion" in result
Once every tool passes its own tests, wire them together. Agent failures at that point are almost always prompt or orchestration issues, not tool bugs.
The granularity rule
One tool should do one thing. If you're writing a tool called get_or_create_customer or search_and_filter_products, split it.
The agent chains tools together — that's literally what it's for. get_customer + create_customer are two clear tools. get_or_create_customer is a hidden branch that the model can't reason about cleanly.
When in doubt, err toward more, smaller tools over fewer, larger ones.
Tool design is the unsexy part of agent development, but it's where most of the reliability comes from. The function calling lesson covers the mechanics; this post covers the craft. Get both right and you'll find your agents are far less frustrating to debug.



