Can OpenClaw run without an OpenAI or Anthropic API key?

Yes. OpenClaw supports Ollama as a local LLM backend. With Ollama installed and a model downloaded, OpenClaw runs entirely on your machine with zero API costs and no data leaving your device.

What hardware do I need to run local LLMs with OpenClaw?

For small models (Phi-3 mini, Llama 3 8B): 8 GB RAM is sufficient. For mid-size models (Llama 3 70B, Mixtral): 32–64 GB RAM or a GPU with 16+ GB VRAM. For everyday assistant tasks, 8B parameter models on 16 GB RAM give perfectly usable results.

Are local models as good as ChatGPT or Claude?

For general assistant tasks — scheduling, reminders, file management, short answers — modern 8B models like Llama 3.1 8B are genuinely capable. For complex reasoning, coding, or long-document tasks, cloud models (GPT-4o, Claude) still have a clear edge. The tradeoff is quality vs. cost and privacy.

Ollama is an open-source tool that makes running LLMs locally as simple as running a Docker container. It handles model download, quantisation, GPU acceleration, and exposes an OpenAI-compatible API — which is why OpenClaw can use it as a drop-in replacement for cloud APIs.

Run OpenClaw with Local LLMs Using Ollama (Zero API Costs)

Every conversation you have with a cloud AI goes through someone else's servers. With OpenAI, Anthropic, or Google — your personal tasks, business details, and private questions are all processed remotely.

Local LLMs via Ollama change that. Your model runs on your own machine. No internet required, no API costs, no data leaving your device. OpenClaw's support for Ollama makes this a practical setup for everyday use.

This guide covers installing Ollama, choosing the right model, and connecting it to OpenClaw.

Why Local LLMs Make Sense for OpenClaw

OpenClaw is already privacy-focused — your memory and data stay on your machine. But if you connect it to the OpenAI or Anthropic API, the actual conversation still leaves your device.

Running Ollama as the LLM backend closes that gap completely:

Zero API costs — models run locally, no tokens billed
Complete privacy — nothing leaves your machine
Works offline — flights, bad internet, corporate networks with restrictions
No rate limits — use it as much as you want
Your data, fully yours — especially important for sensitive business or personal tasks

The tradeoff: local models need local hardware, and they're generally less capable than GPT-4o or Claude Opus for complex reasoning tasks.

Hardware Reality Check

Before installing, know what you're working with:

Your Hardware	Best Model	Quality Level
8 GB RAM (no GPU)	Phi-3 Mini (3.8B)	Basic assistant tasks
16 GB RAM	Llama 3.1 8B	Good everyday assistant
32 GB RAM	Llama 3.1 70B (quantised)	Near GPT-3.5 quality
GPU 8 GB VRAM	Llama 3.1 8B (GPU)	Fast, good quality
GPU 16–24 GB VRAM	Mistral Large, Llama 3.1 70B	Excellent
Mac with Apple Silicon (M1/M2/M3)	Llama 3.1 70B	Excellent (unified memory)

Apple Silicon Macs are particularly good for local LLMs — unified memory means the 36 GB or 64 GB options run 70B models comfortably.

Step 1: Install Ollama

macOS:

brew install ollama
# or download from https://ollama.ai

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Verify:

ollama --version

Start the Ollama server:

ollama serve
# Runs on http://localhost:11434 by default

On macOS, Ollama runs as a menu bar app after installation and starts automatically.

Step 2: Download a Model

# Good starting point — 5 GB download, runs on 8 GB RAM
ollama pull llama3.1:8b

# Faster, lighter — good for quick tasks
ollama pull phi3:mini

# Best quality if you have 16+ GB RAM
ollama pull llama3.1:70b

# Strong coding model
ollama pull codellama:13b

# Good balance for general use
ollama pull mistral:7b

Test the model works:

ollama run llama3.1:8b
# Type a message, press Enter, type /bye to exit

Step 3: Connect Ollama to OpenClaw

Edit your .env file:

# Remove or comment out cloud API keys
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

# Add Ollama config
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

# Tell OpenClaw to use Ollama as the LLM provider
LLM_PROVIDER=ollama

Restart OpenClaw:

docker compose restart openclaw
# or
pm2 restart openclaw

Test: send a message through the web UI at localhost:3000. OpenClaw should respond using the local model.

Model Recommendations by Use Case

For general assistant tasks (reminders, scheduling, file management)

ollama pull llama3.1:8b

llama3.1:8b handles everyday tasks well — following instructions, remembering context, taking actions. Response time: 2–10 seconds on a modern laptop.

For coding help

ollama pull codellama:13b
# or
ollama pull deepseek-coder:6.7b

Code-specific models produce significantly better code than general models at the same parameter count.

For the fastest possible responses (mobile-level hardware or Raspberry Pi)

ollama pull phi3:mini

Phi-3 Mini runs on 4 GB RAM with reasonable quality for short tasks. Response time under 2 seconds on most hardware.

For best quality (if you have the RAM)

ollama pull llama3.1:70b

On Apple Silicon M2/M3 with 32+ GB unified memory, 70B runs at comfortable speeds. Quality approaches GPT-3.5 on most tasks.

Hybrid Setup: Local for Routine, Cloud for Complex

The most practical configuration: use a local model as the default, but switch to a cloud model for tasks that need higher capability.

In .env:

# Default: local Ollama
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b

# Fallback: OpenAI for complex requests
OPENAI_API_KEY=sk-...
OPENAI_FALLBACK_MODEL=gpt-4o

In your SOUL.md, add a rule:

## LLM Usage
- For simple tasks (reminders, file searches, short summaries): use the local model
- When I prefix a message with "think:" or "deep:", use the cloud model for better reasoning
- Always use the cloud model for: code review, long-document analysis, complex planning

This gives you zero-cost routine tasks and cloud quality on-demand.

Performance Optimisation

Enable GPU acceleration (Linux with NVIDIA)

# Install CUDA support for Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Ollama auto-detects CUDA if your drivers are installed

# Verify GPU is being used
ollama run llama3.1:8b
# Check nvidia-smi in another terminal — should show GPU memory usage

Use quantised models for less RAM

Ollama's default downloads are already quantised (Q4_K_M), but you can go lower:

# Q2 quantisation — uses less RAM, lower quality
ollama pull llama3.1:8b-q2_K

# Check memory usage of different quantisations
ollama list

Parallel requests

If multiple platforms are connected to OpenClaw, requests may queue behind each other with a local model. Set Ollama's concurrency:

# In Ollama startup (add to /etc/systemd/system/ollama.service):
Environment="OLLAMA_NUM_PARALLEL=2"

Running Ollama on a VPS

If you run OpenClaw on a Hostinger VPS, you can also run Ollama there. The KVM 2 plan (8 GB RAM) handles phi3:mini and llama3.1:8b at Q4 quantisation.

# On your VPS
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:8b

# Start Ollama as a service
sudo systemctl enable ollama
sudo systemctl start ollama

In OpenClaw's .env on the same VPS:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

This gives you a fully private, always-on AI agent with zero LLM API costs — just the VPS cost (~₹600–900/month on Hostinger KVM 2).

Troubleshooting

OpenClaw can't connect to Ollama:

# Verify Ollama is running
curl http://localhost:11434/api/tags
# Should return a JSON list of your models

# If using Docker, the host is different:
OLLAMA_BASE_URL=http://host.docker.internal:11434

Responses are very slow:

Check model size vs available RAM — if the model doesn't fit in RAM, it swaps to disk (very slow)
Try a smaller model: phi3:mini or llama3.1:8b at Q2 quantisation
On Linux, check: free -h and ollama ps

Model output is poor quality:

For tasks requiring reasoning or coding: upgrade to a larger model or use a cloud fallback
For instruction-following issues: try a different model family (Mistral is often better at following precise instructions than Llama for some tasks)

Out of memory crash:

# Set a hard limit on Ollama's VRAM/RAM usage
OLLAMA_MAX_LOADED_MODELS=1  # only keep one model loaded at a time

This guide covers installing Ollama, choosing the right model, and connecting it to OpenClaw.

Why Local LLMs Make Sense for OpenClaw

OpenClaw is already privacy-focused — your memory and data stay on your machine. But if you connect it to the OpenAI or Anthropic API, the actual conversation still leaves your device.

Running Ollama as the LLM backend closes that gap completely:

Zero API costs — models run locally, no tokens billed
Complete privacy — nothing leaves your machine
Works offline — flights, bad internet, corporate networks with restrictions
No rate limits — use it as much as you want
Your data, fully yours — especially important for sensitive business or personal tasks

The tradeoff: local models need local hardware, and they're generally less capable than GPT-4o or Claude Opus for complex reasoning tasks.

Hardware Reality Check

Before installing, know what you're working with:

Your Hardware	Best Model	Quality Level
8 GB RAM (no GPU)	Phi-3 Mini (3.8B)	Basic assistant tasks
16 GB RAM	Llama 3.1 8B	Good everyday assistant
32 GB RAM	Llama 3.1 70B (quantised)	Near GPT-3.5 quality
GPU 8 GB VRAM	Llama 3.1 8B (GPU)	Fast, good quality
GPU 16–24 GB VRAM	Mistral Large, Llama 3.1 70B	Excellent
Mac with Apple Silicon (M1/M2/M3)	Llama 3.1 70B	Excellent (unified memory)

Apple Silicon Macs are particularly good for local LLMs — unified memory means the 36 GB or 64 GB options run 70B models comfortably.

Step 1: Install Ollama

macOS:

brew install ollama
# or download from https://ollama.ai

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Verify:

ollama --version

Start the Ollama server:

ollama serve
# Runs on http://localhost:11434 by default

On macOS, Ollama runs as a menu bar app after installation and starts automatically.

Step 2: Download a Model

# Good starting point — 5 GB download, runs on 8 GB RAM
ollama pull llama3.1:8b

# Faster, lighter — good for quick tasks
ollama pull phi3:mini

# Best quality if you have 16+ GB RAM
ollama pull llama3.1:70b

# Strong coding model
ollama pull codellama:13b

# Good balance for general use
ollama pull mistral:7b

Test the model works:

ollama run llama3.1:8b
# Type a message, press Enter, type /bye to exit

Step 3: Connect Ollama to OpenClaw

Edit your .env file:

# Remove or comment out cloud API keys
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

# Add Ollama config
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

# Tell OpenClaw to use Ollama as the LLM provider
LLM_PROVIDER=ollama

Restart OpenClaw:

docker compose restart openclaw
# or
pm2 restart openclaw

Test: send a message through the web UI at localhost:3000. OpenClaw should respond using the local model.

Model Recommendations by Use Case

For general assistant tasks (reminders, scheduling, file management)

ollama pull llama3.1:8b

llama3.1:8b handles everyday tasks well — following instructions, remembering context, taking actions. Response time: 2–10 seconds on a modern laptop.

For coding help

ollama pull codellama:13b
# or
ollama pull deepseek-coder:6.7b

Code-specific models produce significantly better code than general models at the same parameter count.

For the fastest possible responses (mobile-level hardware or Raspberry Pi)

ollama pull phi3:mini

Phi-3 Mini runs on 4 GB RAM with reasonable quality for short tasks. Response time under 2 seconds on most hardware.

For best quality (if you have the RAM)

ollama pull llama3.1:70b

On Apple Silicon M2/M3 with 32+ GB unified memory, 70B runs at comfortable speeds. Quality approaches GPT-3.5 on most tasks.

Hybrid Setup: Local for Routine, Cloud for Complex

The most practical configuration: use a local model as the default, but switch to a cloud model for tasks that need higher capability.

In .env:

# Default: local Ollama
LLM_PROVIDER=ollama
OLLAMA_MODEL=llama3.1:8b

# Fallback: OpenAI for complex requests
OPENAI_API_KEY=sk-...
OPENAI_FALLBACK_MODEL=gpt-4o

In your SOUL.md, add a rule:

## LLM Usage
- For simple tasks (reminders, file searches, short summaries): use the local model
- When I prefix a message with "think:" or "deep:", use the cloud model for better reasoning
- Always use the cloud model for: code review, long-document analysis, complex planning

This gives you zero-cost routine tasks and cloud quality on-demand.

Performance Optimisation

Enable GPU acceleration (Linux with NVIDIA)

# Install CUDA support for Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Ollama auto-detects CUDA if your drivers are installed

# Verify GPU is being used
ollama run llama3.1:8b
# Check nvidia-smi in another terminal — should show GPU memory usage

Use quantised models for less RAM

Ollama's default downloads are already quantised (Q4_K_M), but you can go lower:

# Q2 quantisation — uses less RAM, lower quality
ollama pull llama3.1:8b-q2_K

# Check memory usage of different quantisations
ollama list

Parallel requests

If multiple platforms are connected to OpenClaw, requests may queue behind each other with a local model. Set Ollama's concurrency:

# In Ollama startup (add to /etc/systemd/system/ollama.service):
Environment="OLLAMA_NUM_PARALLEL=2"

Running Ollama on a VPS

If you run OpenClaw on a Hostinger VPS, you can also run Ollama there. The KVM 2 plan (8 GB RAM) handles phi3:mini and llama3.1:8b at Q4 quantisation.

# On your VPS
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:8b

# Start Ollama as a service
sudo systemctl enable ollama
sudo systemctl start ollama

In OpenClaw's .env on the same VPS:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b

This gives you a fully private, always-on AI agent with zero LLM API costs — just the VPS cost (~₹600–900/month on Hostinger KVM 2).

Troubleshooting

OpenClaw can't connect to Ollama:

# Verify Ollama is running
curl http://localhost:11434/api/tags
# Should return a JSON list of your models

# If using Docker, the host is different:
OLLAMA_BASE_URL=http://host.docker.internal:11434

Responses are very slow:

Check model size vs available RAM — if the model doesn't fit in RAM, it swaps to disk (very slow)
Try a smaller model: phi3:mini or llama3.1:8b at Q2 quantisation
On Linux, check: free -h and ollama ps

Model output is poor quality:

For tasks requiring reasoning or coding: upgrade to a larger model or use a cloud fallback
For instruction-following issues: try a different model family (Mistral is often better at following precise instructions than Llama for some tasks)

Out of memory crash:

# Set a hard limit on Ollama's VRAM/RAM usage
OLLAMA_MAX_LOADED_MODELS=1  # only keep one model loaded at a time

Why Local LLMs Make Sense for OpenClaw

Hardware Reality Check

Step 1: Install Ollama

Step 2: Download a Model

Step 3: Connect Ollama to OpenClaw

Model Recommendations by Use Case

For general assistant tasks (reminders, scheduling, file management)

For coding help

For the fastest possible responses (mobile-level hardware or Raspberry Pi)

For best quality (if you have the RAM)

Hybrid Setup: Local for Routine, Cloud for Complex

Performance Optimisation

Enable GPU acceleration (Linux with NVIDIA)

Use quantised models for less RAM

Parallel requests

Running Ollama on a VPS

Troubleshooting

Related articles

Llama 4 vs Claude Haiku 3.5: The Cost-Performance Showdown for Indian Developers on a Budget

Best Prompts for OpenClaw: Templates That Actually Work

Get Better Results from OpenClaw: Prompting Strategies

Why Local LLMs Make Sense for OpenClaw

Hardware Reality Check

Step 1: Install Ollama

Step 2: Download a Model

Step 3: Connect Ollama to OpenClaw

Model Recommendations by Use Case

For general assistant tasks (reminders, scheduling, file management)

For coding help

For the fastest possible responses (mobile-level hardware or Raspberry Pi)

For best quality (if you have the RAM)

Hybrid Setup: Local for Routine, Cloud for Complex

Performance Optimisation

Enable GPU acceleration (Linux with NVIDIA)

Use quantised models for less RAM

Parallel requests

Running Ollama on a VPS

Troubleshooting

Related articles

Llama 4 vs Claude Haiku 3.5: The Cost-Performance Showdown for Indian Developers on a Budget

Best Prompts for OpenClaw: Templates That Actually Work

Get Better Results from OpenClaw: Prompting Strategies