What is function calling in LLMs?

Function calling (also called tool use) is a feature in modern LLM APIs that lets the model request the execution of a specific function rather than just generating text. You define available functions with their parameters; the model decides when and how to call them based on the conversation.

Which models support function calling?

All major frontier models support function calling or tool use: Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), Llama 3 (Meta), and Mistral. The parameter schemas are slightly different across providers but the concept is identical.

How does the model decide which tool to call?

The model reads your tool descriptions — the name and description fields you provide — and decides which tool is most relevant to the current situation. This means tool descriptions are effectively prompts. Clear, specific descriptions lead to correct tool selection; vague descriptions lead to wrong or missed calls.

What happens if the model calls a tool with wrong arguments?

Your function receives the arguments the model provides. If they're invalid, your function should return a structured error message rather than raising an exception. The agent can then read the error and try again with corrected arguments — this self-correction is a key part of robust agent design.

Function Calling: Giving LLMs Tools

Why Function Calling Matters

An LLM without tools is limited to what it knows from training. Function calling changes that fundamentally — it gives the model a way to request actions in the real world and receive structured results back.

This is what enables agents to:

Look up real-time information
Run calculations
Read and write files
Query databases
Control browsers
Call APIs

Function calling is the technical plumbing behind all of that.

How It Works: The Three-Step Flow

Function calling follows a predictable three-step loop:

Step 1: You Define the Tools

Before the conversation starts, you tell the model what tools are available by providing a schema for each one:

[
  {
    "name": "get_weather",
    "description": "Get the current weather for a city. Use this when the user asks about weather conditions.",
    "input_schema": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "The city name, e.g. 'London' or 'Tokyo'"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature unit. Default to celsius."
        }
      },
      "required": ["city"]
    }
  }
]

Step 2: The Model Decides to Call a Tool

When relevant, the model responds not with text but with a tool call request — a structured message specifying the tool name and arguments:

{
  "type": "tool_use",
  "name": "get_weather",
  "input": {
    "city": "London",
    "unit": "celsius"
  }
}

Your code detects this, runs the actual get_weather("London", "celsius") function, and returns the result.

Step 3: The Model Receives the Result and Continues

You send the tool result back to the model:

{
  "type": "tool_result",
  "content": "Current weather in London: 12°C, overcast with light rain. Wind: 15 km/h from the southwest."
}

The model now incorporates this into its response, continuing the conversation or making another tool call if needed.

Writing Good Tool Descriptions

The description field is the most important part of any tool definition. It's what the model reads to decide when to call the tool.

Bad description:

{
  "name": "search",
  "description": "Search for information"
}

The model has no idea when to use this versus just answering from memory.

Good description:

{
  "name": "search_web",
  "description": "Search the internet for current, real-time information. Use this when: (1) the question requires up-to-date data like prices, news, or recent events, (2) you are not confident your training data is accurate or recent enough, or (3) the user asks about something that changes frequently. Do NOT use this for general knowledge questions you can answer reliably."
}

This tells the model exactly when to reach for this tool — and when not to.

Rule: Write your description as if you're telling a capable but literal junior employee when to use this resource.

Parallel Tool Calls

Modern models can call multiple tools simultaneously in a single turn when the tasks are independent. This dramatically speeds up complex workflows.

Example: A research agent building a competitive analysis might simultaneously:

Search for Company A's recent news
Search for Company B's recent news
Fetch Company A's latest financial data
Fetch Company B's latest financial data

Instead of 4 sequential turns, the agent does all 4 in one turn and waits for all results.

Enable parallel tool calls in your API configuration — most providers support this by default on capable models.

Handling Tool Results Well

How you structure your tool results affects agent reliability significantly.

Include context, not just data

Poor result:

"12°C"

Better result:

"Current weather in London (as of 14:32 UTC, Feb 26 2026): 12°C, overcast with light rain. Forecast: rain continuing through the evening."

The richer result gives the model more to work with and reduces follow-up tool calls.

Surface errors clearly

If a tool fails, return a structured error:

{
  "error": true,
  "message": "Could not retrieve weather data for 'Londn' — city not found. Did you mean 'London'?"
}

This gives the agent a chance to self-correct rather than silently failing.

Keep results focused

Don't flood the model with unnecessary data. If a database query returns 500 rows, summarize or paginate. Too much data can overwhelm the context window and obscure the relevant information.

A Complete Example: Claude with Tool Use

Here's how function calling looks with Claude's API (simplified):

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "search_web",
        "description": "Search the internet for current information.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"}
            },
            "required": ["query"]
        }
    }
]

# First turn — model decides to call a tool
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the latest news about fusion energy?"}]
)

# Check if model wants to use a tool
if response.stop_reason == "tool_use":
    tool_call = next(b for b in response.content if b.type == "tool_use")

    # Run the actual function
    search_result = search_web(tool_call.input["query"])

    # Send result back
    final_response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What's the latest news about fusion energy?"},
            {"role": "assistant", "content": response.content},
            {"role": "user", "content": [
                {"type": "tool_result", "tool_use_id": tool_call.id, "content": search_result}
            ]}
        ]
    )

Common Pitfalls

Pitfall 1: Too many tools Giving the model 30 tools at once leads to confusion and wrong selections. Keep your active tool set small and relevant. If you need many tools, use a routing layer to present only the relevant subset for each task type.

Pitfall 2: Vague tool names process_data tells the model nothing. extract_entities_from_text is immediately clear.

Pitfall 3: Not handling tool failures If your tool throws an exception and crashes the loop, the agent stops entirely. Always wrap tool execution in error handling and return structured error messages the model can reason about.

Pitfall 4: Ignoring the result Always send tool results back to the model in the next message. A common mistake is running the tool but not adding the result to the conversation, leaving the model to hallucinate what the result might have been.

Key Takeaways

Function calling is a three-step loop: define tools → model requests a call → you run it and return the result
Tool descriptions are prompts — write them clearly to guide the model's selection
Good results include context, timestamps, and clear error messages
Parallel tool calls speed up multi-step tasks dramatically
Keep tool sets small and focused; use routing if you need many tools
Next lesson: ReAct prompting — the reasoning pattern that makes function-calling agents reliable