The Assistants API always felt like it was trying too hard. Threads, runs, run steps, polling for status — a lot of machinery for what should be a simple "call the model, get a result" interaction. OpenAI clearly agreed, because the Responses API dropped most of that complexity while keeping what actually mattered: built-in tools.
If you're building anything that needs web search, data analysis, or document Q&A, the Responses API is the right starting point in 2026.
What changed from Assistants API
The Assistants API required you to create and manage Assistants (persistent configurations), Threads (conversation history), and Runs (individual execution attempts). You'd kick off a run, poll until it completed, retrieve messages, handle tool calls in a loop. For a simple chatbot, this meant writing 80 lines of boilerplate before you got to actual logic.
The Responses API is stateless-first. You send a request, you get a response. No threads to manage, no polling, no run lifecycle. If you need conversation history, you pass previous messages yourself — just like the Chat Completions API.
The built-in tools are the reason to switch: web_search_preview, code_interpreter, and file_search work without any setup. No tool definitions, no function schemas, no webhook URLs. You enable them in the request and the model uses them when it decides they're needed.
The three built-in tools
web_search_preview
Gives the model access to real-time web search. It's useful for anything requiring current information: recent news, current pricing, competitor research, live documentation.
The model decides when to search. When it does, OpenAI handles the search request internally and injects the results into context before generating the response. You don't see the search queries or the raw results unless you inspect the response's reasoning items.
Best for: research bots, competitive intelligence tools, news summarization, anything that gets stale without live data.
code_interpreter
A sandboxed Python environment the model can write and execute code in. The model can generate code, run it, inspect the output, fix errors, and iterate — all within a single response. It can also read uploaded files (CSV, Excel, images) and write output files (charts, processed data).
This is more powerful than it sounds. You can send a messy CSV and say "find anomalies in this dataset" and the model will actually execute Python to do it, not just describe how you could. It handles matplotlib, pandas, numpy, and most standard libraries.
Best for: data analysis, chart generation, math-heavy computations, file format conversions, anything that benefits from running actual code rather than reasoning about code.
file_search
Searches across files you've uploaded to OpenAI's storage. It handles chunking, embedding, and retrieval automatically. You upload PDFs, Word docs, code files — whatever your knowledge base contains — and the model can search and cite them in responses.
It's a managed RAG system. You don't control the chunking strategy or retrieval parameters, which is a limitation for advanced use cases but a significant convenience for most applications.
Best for: document Q&A, internal knowledge bases, support bots grounded in product documentation.
Basic API usage
from openai import OpenAI
client = OpenAI()
# Simple request with web search enabled
response = client.responses.create(
model="gpt-4o",
tools=[{"type": "web_search_preview"}],
input="What's the current pricing for Anthropic's Claude API?"
)
print(response.output_text)
To use multiple tools:
response = client.responses.create(
model="gpt-4o",
tools=[
{"type": "web_search_preview"},
{"type": "code_interpreter"},
],
input="Search for recent benchmark comparisons between GPT-4o and Claude, then create a summary table."
)
Force tool use with tool_choice:
response = client.responses.create(
model="gpt-4o",
tools=[{"type": "code_interpreter"}],
tool_choice={"type": "code_interpreter"}, # always use code interpreter
input="Calculate the compound interest on $10,000 at 7% over 20 years."
)
Streaming responses with tool calls
Streaming is important for user-facing applications — nobody wants to stare at a spinner for 8 seconds while code interpreter runs.
with client.responses.stream(
model="gpt-4o",
tools=[{"type": "code_interpreter"}],
input="Analyze this sales data and create a bar chart: Q1: $120k, Q2: $145k, Q3: $132k, Q4: $178k"
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
elif event.type == "response.completed":
# Handle any output files (like generated charts)
for item in event.response.output:
if item.type == "code_interpreter_call":
for output in item.outputs:
if output.type == "image":
# Save or display the generated image
print(f"\nGenerated chart: {output.image_url}")
Three practical examples
Research bot with web search
def research_topic(query: str) -> str:
response = client.responses.create(
model="gpt-4o",
tools=[{"type": "web_search_preview"}],
instructions=(
"You are a research assistant. Search for current, accurate information. "
"Always cite your sources with URLs. Be concise and factual."
),
input=query,
)
return response.output_text
# Usage
report = research_topic(
"What are the main LLM providers competing with OpenAI in 2026 and what are their latest models?"
)
Data analyst with code interpreter
import base64
def analyze_csv(file_path: str, analysis_request: str) -> dict:
# Upload the file first
with open(file_path, "rb") as f:
file = client.files.create(file=f, purpose="assistants")
response = client.responses.create(
model="gpt-4o",
tools=[{"type": "code_interpreter"}],
input=[
{
"type": "text",
"text": analysis_request
},
{
"type": "input_file",
"file_id": file.id
}
]
)
# Extract text response and any generated files
result = {"analysis": response.output_text, "charts": []}
for item in response.output:
if hasattr(item, "outputs"):
for output in item.outputs:
if output.type == "image":
result["charts"].append(output.image_url)
return result
# Usage
result = analyze_csv(
"sales_data.csv",
"Find the top 5 products by revenue, identify seasonal trends, and create a chart."
)
Document Q&A with file search
def setup_knowledge_base(file_paths: list[str]) -> list[str]:
file_ids = []
for path in file_paths:
with open(path, "rb") as f:
file = client.files.create(file=f, purpose="assistants")
file_ids.append(file.id)
return file_ids
def query_documents(question: str, file_ids: list[str]) -> str:
response = client.responses.create(
model="gpt-4o",
tools=[{
"type": "file_search",
"file_search": {"file_ids": file_ids}
}],
input=question
)
return response.output_text
# Usage
file_ids = setup_knowledge_base(["product_docs.pdf", "faq.pdf", "changelog.md"])
answer = query_documents("What changed in version 3.2 of the product?", file_ids)
Cost considerations
Built-in tools add overhead. Web search adds a fixed fee per search call (currently $0.025 per search in addition to token costs). Code interpreter charges $0.03 per session, and a "session" resets after an hour of inactivity — so long batch jobs can accumulate multiple session fees.
File search has two cost components: the storage cost for uploaded files ($0.10/GB/day after the free tier) and the vector store search cost ($0.10 per 1,000 queries after the first 1,000 free per day).
For function calling use cases where you're calling your own APIs rather than OpenAI's built-in tools, the standard Chat Completions API is still more cost-effective — you only pay for tokens. The Responses API's value is in the managed tool infrastructure.
Structured outputs work with the Responses API too, and they're worth using when you need predictable JSON shapes from responses.
Responses API vs Chat Completions API
Use Responses API when:
- You need web search, code execution, or document retrieval
- You want OpenAI to manage tool execution complexity
- You're prototyping and want to move fast
Use Chat Completions API when:
- You're calling your own tools/functions
- You need maximum control over system behavior
- Cost efficiency is critical and you don't need built-in tools
- You're building something the Responses API's abstraction doesn't fit
The Responses API doesn't support every feature the Chat Completions API has. If you're using custom function tools extensively, you'll hit friction. For straightforward applications that need one or more of the three built-in tools, it's meaningfully simpler.
Migrating from Assistants API
The migration is largely a simplification. Things you can delete:
- Thread creation and management
- Run lifecycle handling (polling, status checks)
- Message retrieval from thread
Things that carry over directly:
- Tool definitions (code_interpreter and file_search work the same way)
- File uploads and file IDs
- System instructions (now called
instructionsin the request)
The main behavior change: conversation state is now your responsibility. If you were using Threads for persistent history, you'll need to pass previous messages explicitly in each request. For most applications, this is actually cleaner — you control exactly what context the model sees.
One thing that doesn't yet exist in the Responses API: background runs for long-running async tasks. The Assistants API had a run/polling model that handled tasks taking minutes. If you have heavy code interpreter jobs that time out, you may need to keep that workload on the Assistants API for now — or handle async execution yourself.
The direction of travel is clear though. Responses API is OpenAI's preferred path forward, and the Assistants API isn't getting meaningful new features.



