Every conversation you have with a cloud AI goes through someone else's servers. With OpenAI, Anthropic, or Google — your personal tasks, business details, and private questions are all processed remotely.
Local LLMs via Ollama change that. Your model runs on your own machine. No internet required, no API costs, no data leaving your device. OpenClaw's support for Ollama makes this a practical setup for everyday use.
This guide covers installing Ollama, choosing the right model, and connecting it to OpenClaw.
Why Local LLMs Make Sense for OpenClaw
OpenClaw is already privacy-focused — your memory and data stay on your machine. But if you connect it to the OpenAI or Anthropic API, the actual conversation still leaves your device.
Running Ollama as the LLM backend closes that gap completely:
- Zero API costs — models run locally, no tokens billed
- Complete privacy — nothing leaves your machine
- Works offline — flights, bad internet, corporate networks with restrictions
- No rate limits — use it as much as you want
- Your data, fully yours — especially important for sensitive business or personal tasks
The tradeoff: local models need local hardware, and they're generally less capable than GPT-4o or Claude Opus for complex reasoning tasks.
Hardware Reality Check
Before installing, know what you're working with:
| Your Hardware | Best Model | Quality Level |
|---|---|---|
| 8 GB RAM (no GPU) | Phi-3 Mini (3.8B) | Basic assistant tasks |
| 16 GB RAM | Llama 3.1 8B | Good everyday assistant |
| 32 GB RAM | Llama 3.1 70B (quantised) | Near GPT-3.5 quality |
| GPU 8 GB VRAM | Llama 3.1 8B (GPU) | Fast, good quality |
| GPU 16–24 GB VRAM | Mistral Large, Llama 3.1 70B | Excellent |
| Mac with Apple Silicon (M1/M2/M3) | Llama 3.1 70B | Excellent (unified memory) |
Apple Silicon Macs are particularly good for local LLMs — unified memory means the 36 GB or 64 GB options run 70B models comfortably.
Step 1: Install Ollama
macOS:
brew install ollama
# or download from https://ollama.ai
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Verify:
ollama --version
Start the Ollama server:
ollama serve
# Runs on http://localhost:11434 by default
On macOS, Ollama runs as a menu bar app after installation and starts automatically.
Step 2: Download a Model
# Good starting point — 5 GB download, runs on 8 GB RAM
ollama pull llama3.1:8b
# Faster, lighter — good for quick tasks
ollama pull phi3:mini
# Best quality if you have 16+ GB RAM
ollama pull llama3.1:70b
# Strong coding model
ollama pull codellama:13b
# Good balance for general use
ollama pull mistral:7b
Test the model works:
ollama run llama3.1:8b
# Type a message, press Enter, type /bye to exit
Step 3: Connect Ollama to OpenClaw
Edit your .env file:
# Remove or comment out cloud API keys
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# Add Ollama config
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
# Tell OpenClaw to use Ollama as the LLM provider
LLM_PROVIDER=ollama
Restart OpenClaw:
docker compose restart openclaw
# or
pm2 restart openclaw
Test: send a message through the web UI at localhost:3000. OpenClaw should respond using the local model.
Model Recommendations by Use Case
For general assistant tasks (reminders, scheduling, file management)
ollama pull llama3.1:8b
llama3.1:8b handles everyday tasks well — following instructions, remembering context, taking actions. Response time: 2–10 seconds on a modern laptop.
For coding help
ollama pull codellama:13b
# or
ollama pull deepseek-coder:6.7b
Code-specific models produce significantly better code than general models at the same parameter count.
For the fastest possible responses (mobile-level hardware or Raspberry Pi)
ollama pull phi3:mini
Phi-3 Mini runs on 4 GB RAM with reasonable quality for short tasks. Response time under 2 seconds on most hardware.
For best quality (if you have the RAM)
ollama pull llama3.1:70b
On Apple Silicon M2/M3 with 32+ GB unified memory, 70B runs at comfortable speeds. Quality approaches GPT-3.5 on most tasks.
Hybrid Setup: Local for Routine, Cloud for Complex
The most practical configuration: use a local model as the default, but switch to a cloud model for tasks that need higher capability.
In .env:
# Default: local Ollama
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b
# Fallback: OpenAI for complex requests
OPENAI_API_KEY=sk-...
OPENAI_FALLBACK_MODEL=gpt-4o
In your SOUL.md, add a rule:
## LLM Usage
- For simple tasks (reminders, file searches, short summaries): use the local model
- When I prefix a message with "think:" or "deep:", use the cloud model for better reasoning
- Always use the cloud model for: code review, long-document analysis, complex planning
This gives you zero-cost routine tasks and cloud quality on-demand.
Performance Optimisation
Enable GPU acceleration (Linux with NVIDIA)
# Install CUDA support for Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Ollama auto-detects CUDA if your drivers are installed
# Verify GPU is being used
ollama run llama3.1:8b
# Check nvidia-smi in another terminal — should show GPU memory usage
Use quantised models for less RAM
Ollama's default downloads are already quantised (Q4_K_M), but you can go lower:
# Q2 quantisation — uses less RAM, lower quality
ollama pull llama3.1:8b-q2_K
# Check memory usage of different quantisations
ollama list
Parallel requests
If multiple platforms are connected to OpenClaw, requests may queue behind each other with a local model. Set Ollama's concurrency:
# In Ollama startup (add to /etc/systemd/system/ollama.service):
Environment="OLLAMA_NUM_PARALLEL=2"
Running Ollama on a VPS
If you run OpenClaw on a Hostinger VPS, you can also run Ollama there. The KVM 2 plan (8 GB RAM) handles phi3:mini and llama3.1:8b at Q4 quantisation.
# On your VPS
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:8b
# Start Ollama as a service
sudo systemctl enable ollama
sudo systemctl start ollama
In OpenClaw's .env on the same VPS:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
This gives you a fully private, always-on AI agent with zero LLM API costs — just the VPS cost (~₹600–900/month on Hostinger KVM 2).
Troubleshooting
OpenClaw can't connect to Ollama:
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Should return a JSON list of your models
# If using Docker, the host is different:
OLLAMA_BASE_URL=http://host.docker.internal:11434
Responses are very slow:
- Check model size vs available RAM — if the model doesn't fit in RAM, it swaps to disk (very slow)
- Try a smaller model:
phi3:miniorllama3.1:8bat Q2 quantisation - On Linux, check:
free -handollama ps
Model output is poor quality:
- For tasks requiring reasoning or coding: upgrade to a larger model or use a cloud fallback
- For instruction-following issues: try a different model family (Mistral is often better at following precise instructions than Llama for some tasks)
Out of memory crash:
# Set a hard limit on Ollama's VRAM/RAM usage
OLLAMA_MAX_LOADED_MODELS=1 # only keep one model loaded at a time