Every AI agent is only as capable as its tools. The model provides the reasoning; the tools provide the action. A research agent with a great search tool will outperform a reasoning-optimized model paired with a poor one. But tools don't just need to exist — they need to be designed so the LLM uses them correctly, at the right time, with the right parameters.
This is the part most agent tutorials skip. They show you how to register a tool. They don't show you what makes a tool good.
How LLMs "read" tool definitions
When you define a tool for an agent, you provide three things: a name, a description, and a parameter schema (typically JSON Schema).
The model reads the name and description to decide when to call the tool. It reads the parameter schema to know what to pass.
Here's the implication: the description is the UI. It's the only instruction the model gets about the tool's purpose and correct usage. A vague description leads to incorrect tool selection and wrong parameter values — not because the model is weak, but because it was given insufficient information. Treat tool descriptions like you'd treat documentation for a colleague who has never seen your codebase.
A taxonomy of tools
Before designing individual tools, it helps to have a mental model of the four categories agents typically need.
Retrieval tools
Read-only access to data. No side effects.
Examples: search_knowledge_base(query), get_order_status(order_id), search_web(query), read_file(path)
Design principle: retrieval tools should never modify state. They should return structured data — not a wall of prose that the agent has to parse. A tool that returns {"status": "shipped", "eta": "2026-03-07"} is better than one that returns "Your order shipped and should arrive Friday." The agent can reason about the first; it has to summarize the second.
Action tools
Write operations that change state. These have side effects.
Examples: create_ticket(title, description, priority), send_email(to, subject, body), update_record(id, fields), delete_item(id)
Design principle: action tools should be atomic — one tool, one action. They should return a confirmation of exactly what was done, including any IDs or timestamps generated. If the agent doesn't know whether the action succeeded, it may retry (and duplicate) or give incorrect status to the user.
Computation tools
Perform calculations or transformations on data.
Examples: calculate_price(base, discount, tax_rate), convert_currency(amount, from, to), extract_entities(text), summarize_document(content, max_words)
Design principle: computation tools should be deterministic — same inputs, same output, no side effects. They exist because LLMs are unreliable at precise arithmetic and should not be trusted for financial calculations. Always offload math to a computation tool.
Communication tools
Connect to external services to send messages or events.
Examples: send_sms(phone, message), post_slack_message(channel, text), create_calendar_event(title, date, attendees)
Design principle: communication tools send things to real people in the real world. This makes them the highest-stakes category. The agent's system prompt should explicitly instruct it to confirm with the user before calling any communication tool. This is not optional — a misfire here is visible, embarrassing, or worse.
Principles of good tool design
1. Atomic
One tool = one action. Don't create send_email_and_create_ticket(). Separate concerns. Composite tools are harder for the agent to reason about, harder for you to test, and make failure modes more complex. If you find yourself naming a tool with "and," split it.
2. Predictable
Same inputs should produce the same effect. Avoid tools that behave differently based on hidden state or ambient conditions the agent can't observe. If a tool's behavior depends on the current time, the user's account tier, or a feature flag, that context should either be passed as a parameter or documented explicitly in the description.
3. Self-documenting
The description should be clear enough that a human — one who has never seen the codebase — would know exactly when to use it and when not to.
Poor example:
name: "process_request"
description: "Process a request"
The agent has no idea what this does, what inputs it expects, or when it's appropriate.
Good example:
name: "create_support_ticket"
description: "Create a new support ticket in the help desk system. Use this when
the customer's issue cannot be resolved in this conversation and needs human
follow-up. Do NOT use for general inquiries that can be answered directly.
Priority levels: low (cosmetic issues), medium (functional bugs), high (blocking
issues), critical (system-wide outages or data loss)."
The agent knows what the tool does, when to use it, when not to use it, and what the parameter values mean.
4. Fail gracefully
Return structured error information, not exceptions. The agent needs to know what went wrong in order to decide what to do next.
Poor failure response: the tool throws an exception that the orchestration framework converts to a cryptic string like "Error: 500 Internal Server Error"
Good failure response:
{
"success": false,
"error": "order_not_found",
"message": "No order with ID 1234 exists in the system.",
"suggestion": "Verify the order ID with the customer or search by email instead."
}
With the good response, the agent can tell the user specifically what happened and suggest a next step.
5. Return structured data
Return JSON objects, not prose paragraphs. The agent reads tool output as part of its reasoning context.
Poor return value: "The order 1234 was placed on January 15th and is currently being processed. Expected delivery is January 20th."
Good return value:
{
"order_id": "1234",
"status": "processing",
"placed_at": "2026-01-15",
"estimated_delivery": "2026-01-20",
"items": [{"sku": "WIDGET-01", "qty": 2}]
}
Structured data lets the agent pick out the specific field it needs for its next reasoning step, rather than re-parsing natural language.
6. Appropriate scope
Don't make tools that do too much or too little. A tool named search_everything(query) with no documented scope is too broad — the agent can't know what it searches, and it may call it redundantly. Having 50 single-purpose micro-tools is too fragmented — the agent spends its context window deciding between get_user_name and get_user_email when a single get_user_profile would work.
A useful heuristic: if you can describe exactly what the tool does in one sentence without any "or" clauses, the scope is right.
7. Clear names
Use verb_noun format consistently: create_ticket, search_customer, send_notification, update_order_status. Avoid generic verbs like process, handle, or execute — they reveal nothing about what the tool actually does.
JSON Schema best practices
The parameter schema is how you communicate what data the tool expects. Most developers fill out the top-level description and leave property-level descriptions empty. This is a mistake.
Include descriptions on every property, not just the tool itself:
{
"name": "create_support_ticket",
"parameters": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Short summary of the issue. Max 100 characters."
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "critical"],
"description": "Issue severity. Use 'critical' only for system-wide outages."
},
"customer_id": {
"type": "string",
"description": "Customer ID from the CRM. Format: 'CUST-XXXXX', e.g. 'CUST-10234'."
}
},
"required": ["title", "priority", "customer_id"]
}
}
Four patterns that improve schema quality:
- Use enum for constrained values — this prevents the model from inventing values like
"urgent"instead of"high". - Include format examples in descriptions —
"Format: 'CUST-XXXXX', e.g. 'CUST-10234'"is unambiguous. - Mark required parameters explicitly in the
requiredarray — don't leave the model guessing. - Set sensible defaults for optional parameters — document what happens if an optional field is omitted.
The too-many-tools problem
Models struggle when presented with more than roughly 15 tools. The problem isn't capability — it's attention and disambiguation. With 25 tools, the model spends a meaningful portion of its reasoning budget evaluating options, and wrong selections increase.
If your agent needs more than 15 tools, the solution is usually one of:
- Split into specialist agents: a research sub-agent with search tools, an action sub-agent with write tools. The orchestrating agent routes to the specialist.
- Dynamic tool loading: retrieve the relevant tools based on the user's query before constructing the agent call. A query about orders loads order tools; a query about billing loads billing tools.
Common tool design mistakes
Overlapping tools. You create search_faq and search_docs, both of which search text content. The agent picks the wrong one half the time because the distinction isn't meaningful from the description. Fix: merge them into search_knowledge_base(query, source: "faq"|"docs"|"all"), or write descriptions that clearly delineate the specific content each one covers.
Missing behavioral guardrails in descriptions. update_user(id, fields) with no note that this requires user confirmation. The agent updates records silently because nothing told it not to. The description should include: "Only call this after explicitly confirming the changes with the user."
Ambiguous return values. The tool returns {"result": "success"} without indicating what was actually done. The agent can't verify the outcome and may retry the operation or give the user incorrect information. Return the actual state: {"ticket_id": "TKT-5589", "status": "created", "assigned_to": "support-tier-1"}.
Side-effect tools named like retrieval tools. A tool named get_invoice(invoice_id) that also marks the invoice as viewed is dangerous — agents will call it freely, not expecting side effects. If a tool has side effects, the name and description should make that clear.
What to read next
Understanding how the model calls tools at a protocol level — what gets sent in the API request, what gets returned — is covered in the function calling lesson. Once you have your tools designed and your agent working, Building Production-Ready Agents covers what's required to take it from a prototype to a deployment that handles real users reliably.