OpenClaw is LLM-agnostic — it works with OpenAI, Anthropic, Google, or any local model via Ollama. Which one you choose makes a real difference to the quality, cost, and privacy of your AI agent.
This is a practical breakdown based on using each model as an OpenClaw backend for real assistant tasks — not benchmark scores.
What Matters for an OpenClaw LLM
For a daily AI agent, the relevant criteria are different from a one-off chat interface:
- Instruction-following — does it actually respect the rules you set in SOUL.md?
- Tool use / function calling — does it correctly trigger skills at the right time?
- Memory integration — can it reason coherently when given long memory context?
- Context window — how much conversation + memory + SOUL.md can it handle at once?
- Cost — you're paying per message, every day, indefinitely
- Latency — you notice if your WhatsApp AI takes 8 seconds to reply
Option 1: Claude (Anthropic)
Best for: highest quality, best instruction-following
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-6
Claude's strongest advantage for OpenClaw use is instruction-following. SOUL.md is essentially a long system prompt with detailed rules about how to communicate, what to prioritise, and what to never do. Claude respects those rules more consistently than other models in practice.
The 200k token context window means Claude can hold significantly more memory context + conversation history without degrading. For long-term users with extensive memory databases, this matters.
Models to consider:
| Model | Use case | Approx cost per 1M tokens |
|---|---|---|
| claude-opus-4-6 | Highest quality, complex tasks | $15 input / $75 output |
| claude-sonnet-4-6 | Best balance for daily use | $3 / $15 |
| claude-haiku-4-5-20251001 | Fast, cheap, good for simple tasks | $0.25 / $1.25 |
Recommendation: claude-sonnet-4-6 for most people. It handles complex requests well at a cost that stays reasonable for daily use.
For low-intensity personal assistant tasks — scheduling, reminders, short answers — claude-haiku-4-5-20251001 at a fraction of the cost is perfectly capable.
Option 2: OpenAI (GPT-4o)
Best for: versatility, tool use, familiarity
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o
GPT-4o has the most mature function-calling and tool-use implementation. If you're building complex custom skills with multiple parameters and edge cases, GPT-4o's tool use is the most reliable.
Models to consider:
| Model | Use case | Approx cost per 1M tokens |
|---|---|---|
| gpt-4o | Best quality, best tool use | $5 input / $15 output |
| gpt-4o-mini | Sweet spot for daily use | $0.15 / $0.60 |
| o3-mini | Complex reasoning tasks | $1.10 / $4.40 |
Recommendation: gpt-4o-mini for everyday personal assistant use. The quality gap between mini and full 4o is small for simple tasks (scheduling, file management, short summaries), and the cost difference is enormous.
Use gpt-4o for tasks where you notice mini falling short — complex multi-step planning, code review, nuanced analysis.
Option 3: Gemini (Google)
Best for: speed, Google Workspace tasks
GOOGLE_AI_API_KEY=your-key
GOOGLE_MODEL=gemini-1.5-flash
Gemini Flash is extremely fast — the fastest cloud option by latency for short responses. If you prioritise snappy replies over maximum reasoning quality, Flash is hard to beat.
Important caveat: Given the ongoing Google/OpenClaw API ToS situation, using Gemini as your LLM while also connecting Google Calendar or Gmail integrations puts more of your activity through Google's systems. Some users prefer to keep their LLM provider separate from their connected services.
| Model | Cost per 1M tokens |
|---|---|
| gemini-1.5-flash | $0.075 / $0.30 (cheapest cloud option) |
| gemini-1.5-pro | $1.25 / $5.00 |
Option 4: Local Models via Ollama
Best for: zero cost, full privacy, offline use
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
OLLAMA_BASE_URL=http://localhost:11434
Full guide: Run OpenClaw with Ollama →
Model quality comparison for assistant tasks:
| Model | RAM needed | Quality (1-10) | Speed |
|---|---|---|---|
| Phi-3 Mini 3.8B | 4 GB | 6 | Very fast |
| Llama 3.1 8B | 8 GB | 7.5 | Fast |
| Mistral 7B | 8 GB | 7.5 | Fast |
| Llama 3.1 70B (Q4) | 40 GB | 9 | Moderate |
| Llama 3.1 70B (Q2) | 24 GB | 8.5 | Moderate |
For everyday personal assistant tasks on a machine with 16 GB RAM: llama3.1:8b is the right choice. Response quality is comparable to GPT-3.5 for instruction-following and short tasks.
Where local models fall short: complex multi-step reasoning, nuanced code review, tasks that require precise adherence to detailed SOUL.md rules. Cloud models still have a clear edge here.
Cost Comparison: Real Numbers
Estimating based on 75 messages/day (a moderate personal assistant usage), with ~2,000 tokens per exchange (including memory context and SOUL.md):
| Model | Monthly cost estimate |
|---|---|
| claude-opus-4-6 | ~₹7,500–10,000 |
| claude-sonnet-4-6 | ~₹1,500–2,500 |
| claude-haiku-4-5-20251001 | ~₹150–300 |
| gpt-4o | ~₹2,000–4,000 |
| gpt-4o-mini | ~₹50–150 |
| gemini-1.5-flash | ~₹40–100 |
| llama3.1:8b (Ollama) | ₹0 |
These are rough estimates — actual costs depend on your message length, memory size, and SOUL.md length.
Key insight: For most personal assistant use, gpt-4o-mini or claude-haiku-4-5-20251001 are the cost-efficient choices. The flagship models are overkill for "remind me about my meeting" and "what's on my calendar today."
The Hybrid Approach (Best of All)
The setup most power users converge on:
# Primary: cheap and fast for routine tasks
OPENAI_MODEL=gpt-4o-mini # or claude-haiku / gemini-flash
# Override per-request via SOUL.md instruction:
# "When I prefix with 'think:' — use GPT-4o"
# "For code review tasks — use Claude Sonnet"
In SOUL.md:
## LLM Selection
- Default: use the fast, cheap model for all routine tasks
- When my message starts with "deep:" — switch to the best available model
- For tasks involving code review, strategic planning, or anything I mark as important — use the premium model
This keeps daily costs low while preserving access to maximum quality when you actually need it.
My Personal Setup
Running on Hostinger KVM 2 (8 GB RAM):
- Default:
claude-haiku-4-5-20251001— fast, cheap, handles 90% of daily tasks - On-demand:
claude-sonnet-4-6when I prefix with "think:" — complex tasks, analysis - Offline fallback: Ollama with
llama3.1:8b— for when I want zero API usage or privacy
Monthly LLM cost: ~₹200–400 (mostly Haiku with occasional Sonnet)
VPS cost: ~₹700/month (Hostinger KVM 2)
Total cost for an always-on private AI agent: less than most SaaS subscriptions.
Switching Models
Change your model without losing anything:
nano .env
# Update ANTHROPIC_MODEL= or OPENAI_MODEL=
docker compose restart openclaw
# or
pm2 restart openclaw
Your memory database, SOUL.md, and all conversation history are completely separate from the LLM config. The switch is instant.
If moving from OpenAI to Anthropic (or vice versa), the memory and SOUL.md transfer seamlessly — OpenClaw's abstraction layer handles the provider differences.
The Decision Framework
Do you want zero cost and full privacy?
YES → Ollama + Llama 3.1 8B
Do you want the best instruction-following for complex SOUL.md rules?
YES → Claude Sonnet
Do you want the best tool use for custom skills?
YES → GPT-4o
Do you want the cheapest cloud option?
YES → GPT-4o Mini or Claude Haiku
Do you want the fastest response latency?
YES → Gemini Flash (with the Google caveats in mind)
Most people: start with gpt-4o-mini or claude-haiku, upgrade if you notice quality gaps.
