Most "AI research tools" are search engine wrappers. Paste a question, the agent searches, returns a list of sources with one-line summaries. That's not research. That's an interface over Google.
Real research means: breaking the question apart, chasing each thread separately, reading actual sources, noticing when two sources contradict each other, and synthesizing a structured answer you can actually trust.
This post builds that. A research agent that decomposes questions, searches multiple sub-queries, reads full pages with Firecrawl, stores extracted claims, flags contradictions, and produces a structured report with inline citations. The whole thing runs in Python, costs $0.20–0.40 per report, and you control what sources it reads.
Why not just use Perplexity or ChatGPT Deep Research?
Both are excellent for quick answers. Deep Research in particular does a real job on complex questions. But there are reasons to build your own:
- Custom sources: you can point it at internal docs, your company's knowledge base, specific domain sources — not just the public web
- No token limits: corporate subscriptions have usage caps; your own agent doesn't
- Structured output: you control the output format — the exact sections, citation style, and confidence levels you need
- Full pipeline visibility: you can see every search query it ran, every page it read, every claim it extracted
Perplexity is great for "what's the current funding round of company X." This is for when you need a report you're going to share with clients.
Architecture
Question
→ Decompose into 3–5 searchable sub-questions (Claude)
→ For each sub-question:
→ Web search (Brave Search API)
→ Fetch top 3 results (Firecrawl)
→ Extract key claims + confidence (Claude)
→ Store in fact memory
→ Detect contradictions across all stored claims
→ Synthesize structured report with inline citations
Each step is a Claude call with a tight, specific prompt. The fact memory is just a Python list — no vector database needed for reports under 50 sources.
Setup
pip install anthropic requests firecrawl-py
You need:
ANTHROPIC_API_KEYBRAVE_SEARCH_API_KEY— free tier at brave.com/search/api, 2,000 queries/monthFIRECRAWL_API_KEY— for full-page extraction
The full implementation
import anthropic
import requests
import json
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class Claim:
text: str
source_url: str
source_title: str
sub_question: str
confidence: float # 0.0–1.0
class ResearchAgent:
def __init__(self):
self.client = anthropic.Anthropic()
self.facts: list[Claim] = []
self.sources_read: list[str] = []
def decompose(self, question: str) -> list[str]:
"""Break the main question into 3–5 searchable sub-questions."""
response = self.client.messages.create(
model="claude-sonnet-4-6",
max_tokens=400,
messages=[{
"role": "user",
"content": f"""Break this research question into 3–5 specific, searchable sub-questions.
Each sub-question should be answerable with a web search.
Return as a JSON array of strings. No other text.
Question: {question}"""
}],
)
try:
text = response.content[0].text.strip()
# Handle markdown code blocks if present
if text.startswith("```"):
text = text.split("```")[1]
if text.startswith("json"):
text = text[4:]
return json.loads(text)
except (json.JSONDecodeError, IndexError):
return [question] # Fall back to original question
def search(self, query: str, num_results: int = 5) -> list[dict]:
"""Brave Search API."""
response = requests.get(
"https://api.search.brave.com/res/v1/web/search",
headers={
"Accept": "application/json",
"Accept-Encoding": "gzip",
"X-Subscription-Token": os.environ["BRAVE_SEARCH_API_KEY"],
},
params={"q": query, "count": num_results},
)
if not response.ok:
return []
results = response.json().get("web", {}).get("results", [])
return [
{"url": r["url"], "title": r["title"], "description": r.get("description", "")}
for r in results
]
def fetch_page(self, url: str) -> str:
"""Firecrawl for full-page text extraction."""
response = requests.post(
"https://api.firecrawl.dev/v1/scrape",
headers={"Authorization": f"Bearer {os.environ['FIRECRAWL_API_KEY']}"},
json={"url": url, "formats": ["markdown"]},
)
if response.ok:
return response.json().get("markdown", "")[:8000] # cap at 8K chars
return ""
def extract_claims(self, page_text: str, source_url: str, source_title: str, sub_question: str) -> list[Claim]:
"""Extract key factual claims from a page relevant to the sub-question."""
response = self.client.messages.create(
model="claude-haiku-4-5-20251001", # Haiku for extraction — cheap
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Extract factual claims from this text that are relevant to answering: "{sub_question}"
Extract 2–5 specific, verifiable claims. For each claim, estimate confidence (0.0–1.0) based on how clearly it's stated and whether it appears to be based on evidence.
Return JSON array:
[{{"claim": "...", "confidence": 0.9}}]
Text:
{page_text[:4000]}"""
}],
)
try:
text = response.content[0].text.strip()
if text.startswith("```"):
text = text.split("```")[1]
if text.startswith("json"):
text = text[4:]
claims_raw = json.loads(text)
return [
Claim(
text=c["claim"],
source_url=source_url,
source_title=source_title,
sub_question=sub_question,
confidence=c.get("confidence", 0.7),
)
for c in claims_raw
]
except (json.JSONDecodeError, KeyError):
return []
def search_and_extract(self, sub_question: str, top_k: int = 3):
"""Search, fetch top results, and extract claims into fact memory."""
print(f" Searching: {sub_question[:60]}...")
results = self.search(sub_question, num_results=top_k + 2) # extra in case some fail
fetched = 0
for result in results:
if fetched >= top_k:
break
if result["url"] in self.sources_read:
continue
page_text = self.fetch_page(result["url"])
if not page_text:
continue
claims = self.extract_claims(
page_text, result["url"], result["title"], sub_question
)
self.facts.extend(claims)
self.sources_read.append(result["url"])
fetched += 1
print(f" Extracted {len(claims)} claims from {result['title'][:50]}")
def detect_contradictions(self) -> list[dict]:
"""Find contradicting claims across all sources."""
if len(self.facts) < 4:
return []
claims_text = "\n".join(
f"[{i+1}] ({c.source_title[:40]}): {c.text}"
for i, c in enumerate(self.facts)
)
response = self.client.messages.create(
model="claude-sonnet-4-6",
max_tokens=600,
messages=[{
"role": "user",
"content": f"""Review these research claims and identify any direct contradictions or significant disagreements between sources.
Claims:
{claims_text}
Return a JSON array of contradictions. If none: return [].
Format: [{{"claim_a_idx": 1, "claim_b_idx": 5, "description": "Source A says X while Source B says Y"}}]"""
}],
)
try:
text = response.content[0].text.strip()
if text.startswith("```"):
text = text.split("```")[1]
if text.startswith("json"):
text = text[4:]
return json.loads(text)
except (json.JSONDecodeError, ValueError):
return []
def synthesize(self, original_question: str) -> str:
"""Write the final research report from all extracted facts."""
claims_with_sources = "\n".join(
f"[{i+1}] {c.text} (Source: {c.source_title}, URL: {c.source_url}, confidence: {c.confidence})"
for i, c in enumerate(self.facts)
)
contradictions = self.detect_contradictions()
contradictions_text = (
"\n".join(f"- {c['description']}" for c in contradictions)
if contradictions else "None detected."
)
response = self.client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""Write a structured research report answering: "{original_question}"
Use the extracted claims below as your source material. Cite claims inline as [1], [2], etc.
EXTRACTED CLAIMS:
{claims_with_sources}
CONTRADICTIONS BETWEEN SOURCES:
{contradictions_text}
Format the report as:
## Executive Summary
[2–3 sentence answer to the question]
## Key Findings
[Bullet points with inline citations like [1], [3]]
## Contradictions and Uncertainties
[Flag any conflicting information from sources; explain both sides]
## Confidence Assessment
[Overall confidence: High/Medium/Low — and why]
## Sources
[Numbered list matching the citation numbers above, with titles and URLs]
Be direct. No hedging. If the evidence is strong, say so. If sources conflict, show the conflict."""
}],
)
return response.content[0].text
def research(self, question: str) -> str:
"""Run the full research pipeline."""
print(f"\nResearching: {question}\n")
# Step 1: Decompose
sub_questions = self.decompose(question)
print(f"Sub-questions ({len(sub_questions)}):")
for q in sub_questions:
print(f" - {q}")
print()
# Step 2: Search and extract for each sub-question
for sub_q in sub_questions:
self.search_and_extract(sub_q, top_k=3)
print(f"\nTotal claims extracted: {len(self.facts)}")
print(f"Sources read: {len(self.sources_read)}\n")
# Step 3: Synthesize
print("Synthesizing report...")
report = self.synthesize(question)
return report
# Usage
import os
agent = ResearchAgent()
report = agent.research(
"What are the best payment gateways for Indian SaaS companies in 2026, and what are the fee differences?"
)
print(report)
Example output structure
For the question "What are the tradeoffs between PostgreSQL and MySQL for Indian startups in 2026?", the agent produces:
## Executive Summary
PostgreSQL has become the default choice for Indian startups building on modern stacks,
while MySQL remains dominant in legacy systems and LAMP-stack deployments. The feature
gap has narrowed significantly, but PostgreSQL's JSON support, extensions ecosystem,
and ACID compliance make it the better choice for new projects [1][3].
## Key Findings
- PostgreSQL's JSONB indexing is 3–5× faster than MySQL's JSON column for document-style queries [2]
- MySQL 8.0 and PostgreSQL 15 have comparable performance on OLTP workloads under 1M rows/day [4]
- Managed PostgreSQL on AWS RDS costs ~15% more than MySQL in ap-south-1 region [6]
- Supabase (PostgreSQL-based) is the fastest-growing database platform among Indian startups in 2025 [3]
## Contradictions and Uncertainties
- Sources disagree on replication performance: [2] says MySQL replication lag is lower,
while [5] cites PostgreSQL logical replication as superior for multi-region setups
## Confidence Assessment
Medium-High. Most claims are supported by multiple sources, but benchmark numbers
vary by workload type. The startup-preference data is anecdotal.
## Sources
[1] StackOverflow Developer Survey 2025 — https://...
[2] Percona MySQL vs PostgreSQL 2025 Benchmark — https://...
...
Handling long questions and follow-up
For multi-part questions, call research() once per distinct question rather than bundling. The fact memory doesn't persist between calls by default — instantiate a new agent for each independent topic.
For follow-up questions on the same topic:
# Research initial question
agent = ResearchAgent()
report1 = agent.research("What is the market size of Indian SaaS in 2026?")
# Continue research — reuse the agent to keep accumulated facts
report2 = agent.synthesize(
"Given the facts already collected, what are the fastest-growing segments?"
)
Cost breakdown
For a 10-source research report:
- 1 decomposition call (Sonnet): ~$0.003
- 10 Firecrawl scrapes: ~$0.005 each = $0.05
- 10 claim extraction calls (Haiku): ~$0.002 each = $0.02
- 1 contradiction detection (Sonnet): ~$0.01
- 1 synthesis call (Sonnet): ~$0.05–0.15 depending on report length
Total: $0.13–0.23 per report. For a team producing 20 research reports/week, that's ~$250/month.
Compared to Perplexity and Deep Research
| This agent | Perplexity Pro | ChatGPT Deep Research | |
|---|---|---|---|
| Cost per report | $0.15–0.25 | ~$0.50 subscription cost | ~$1–2 |
| Custom sources | Yes | No | No |
| Contradiction detection | Yes | No | Partial |
| Output format control | Full | Limited | Limited |
| Speed | 2–5 min | 30 sec | 5–15 min |
| Requires engineering | Yes | No | No |
The Firecrawl extraction post covers more advanced extraction patterns when you need structured data from sources rather than full text. The AI research workflows post covers how to integrate this into a full research pipeline with document storage and retrieval.



