Running OpenClaw with LM Studio means zero API costs, full privacy, and complete offline operation. The tradeoff is hardware requirements and somewhat lower response quality compared to frontier cloud models.
If you've already set up OpenClaw with Ollama, LM Studio works the same way — it's the GUI alternative for the same local inference approach.
Why LM Studio Over Ollama?
Both tools run local models and expose an OpenAI-compatible API. The differences:
| LM Studio | Ollama | |
|---|---|---|
| Interface | GUI desktop app | CLI only |
| Model discovery | Built-in browser (Hugging Face) | Manual download or ollama pull |
| Headless server | Possible but not its strength | Native, ideal |
| GPU support | Excellent (CUDA, Metal, Vulkan) | Good |
| Windows support | Excellent | Good |
| Model formats | GGUF | GGUF |
Choose LM Studio if you're on a desktop Windows or Mac and want a graphical way to manage models. Choose Ollama if you're deploying on a headless Linux VPS.
Step 1: Install LM Studio
Download LM Studio from lmstudio.ai. Versions available for macOS (Apple Silicon and Intel), Windows, and Linux.
Install and launch it. On first run, it will ask to download a model — you can skip this and do it in the next step.
Step 2: Download a Model
In LM Studio, click Search in the left sidebar. Browse available models. Good starting choices for OpenClaw use:
| Model | Size | RAM Required | Notes |
|---|---|---|---|
| Llama 3.2 3B (Q4) | ~2GB | 6GB | Very fast, lightweight tasks |
| Llama 3.1 8B (Q4) | ~5GB | 8GB | Good balance of speed and capability |
| Mistral 7B Instruct (Q4) | ~4GB | 8GB | Strong instruction-following |
| Llama 3.1 13B (Q4) | ~8GB | 16GB | Noticeably better for complex tasks |
| Qwen 2.5 14B (Q4) | ~9GB | 16GB | Excellent for coding tasks |
Search the model name, click the version with Q4_K_M quantisation (good quality-to-size ratio), and click Download.
Step 3: Start the Local Server
- Click Local Server in the left sidebar (the
<->icon) - Select your downloaded model in the dropdown
- Click Start Server
- Note the server address — typically
http://localhost:1234
The server exposes an OpenAI-compatible API. Leave LM Studio running.
Step 4: Configure OpenClaw
Add LM Studio as a provider in ~/.openclaw/config/providers.yml:
providers:
lmstudio:
api_key: "lm-studio" # Any string — LM Studio doesn't validate API keys
base_url: "http://localhost:1234/v1"
default_model: "llama-3.1-8b-instruct"
models:
- id: "llama-3.1-8b-instruct"
max_tokens: 4096
The api_key field can be any non-empty string — LM Studio's local server doesn't authenticate requests.
Set as active provider in config.yml:
llm:
active_provider: "lmstudio"
active_model: "llama-3.1-8b-instruct"
Restart OpenClaw and test.
Performance Expectations
Running inference locally is slower than cloud APIs. Typical generation speeds on consumer hardware:
| Hardware | Model | Tokens/sec |
|---|---|---|
| M1 Mac Mini (8GB) | Llama 3.1 8B Q4 | 25–40 tok/s |
| M2 MacBook Pro (16GB) | Llama 3.1 13B Q4 | 30–50 tok/s |
| RTX 3080 (10GB VRAM) | Mistral 7B Q4 | 60–100 tok/s |
| RTX 4090 (24GB VRAM) | Llama 3.1 13B Q4 | 80–130 tok/s |
| CPU only (16GB RAM) | Llama 3.1 8B Q4 | 3–8 tok/s |
A 200-token response at 40 tokens/sec takes ~5 seconds. That's acceptable for WhatsApp messages but noticeably slower than cloud APIs (typically 1–3 seconds).
CPU-only inference is usable for low-frequency tasks but too slow for a conversational AI agent.
Practical Tips
Keep LM Studio's server running: If you close LM Studio, OpenClaw loses its LLM. Either keep the window open or configure LM Studio to start the server on launch.
Model switching without restart: LM Studio lets you swap models without stopping the server. Change the model in the dropdown, wait for it to load, and OpenClaw's next request will use the new model automatically.
Use a dedicated SOUL.md preamble for local models: Local models are less reliable at following complex SOUL.md instruction sets than frontier models. Add a simplified version of your key rules at the start of each conversation context.
Hybrid approach: Use LM Studio/local models for routine, low-stakes tasks. Keep an OpenAI or Anthropic key configured as a fallback for complex requests:
llm:
active_provider: "lmstudio"
fallback_provider: "anthropic" # Used when local model returns errors or low confidence
Related reading: