How much does OpenClaw cost per month in API fees?

For typical personal use (30–50 messages/day), expect: GPT-4o ~$15–40/month, Claude Sonnet ~$10–25/month, Gemini Flash ~$1–5/month, local models via Ollama ~$0. Heavy users or those running automations frequently will pay more.

How can I reduce OpenClaw's API costs?

Key strategies: use Gemini Flash or GPT-4o-mini as your default model (10–50x cheaper than flagship models), trim conversation context to reduce token count per request, use local models for routine tasks, set monthly token budgets in config, and route complex tasks to premium models only.

Does OpenClaw's memory use tokens?

Yes. Every message includes a context window containing relevant memory snippets. The more memory that's retrieved per request, the higher the token count. You can control memory context size in config to balance richness vs cost.

Can I set a monthly spending limit?

OpenClaw supports per-provider monthly token budgets. Beyond OpenClaw's own limits, also set spending limits at the provider level (OpenAI, Anthropic, and Google all offer usage caps in their dashboards).

Managing OpenClaw Token Usage and Keeping API Costs Down

OpenClaw's LLM API costs are variable — they depend on your message volume, model choice, and how much context is passed per request. With the right configuration, you can control costs precisely without sacrificing quality where it matters.

Understanding Where Tokens Are Spent

Every OpenClaw request sends:

System prompt — Your SOUL.md content (constant per request)
Memory context — Relevant snippets from your conversation history
Integration context — Data from connected tools (if relevant)
Your message — The actual input
Response — The AI's reply (output tokens)

For a typical request:

System prompt: 200–600 tokens
Memory context: 200–1000 tokens
Your message: 20–200 tokens
Response: 100–500 tokens

Total: ~~600–2300 tokens per request. At GPT-4o pricing (~~$5/M input, $15/M output), that's roughly $0.01–0.04 per message.

At 50 messages/day: $15–60/month on GPT-4o. At 50 messages/day on Gemini Flash: $0.15–0.50/month.

Monitoring Current Usage

# View token usage by day
openclaw stats --period 7d

# View usage by model
openclaw stats --by-model

# View top token-consuming conversations
openclaw stats --top-conversations

Set up a weekly usage report:

scheduled_tasks:
  - name: "weekly-cost-report"
    cron: "0 9 * * 1"    # Monday 9am
    action: "internal"
    function: "generate_usage_report"
    format: "weekly"
    send_to: "whatsapp"   # or "telegram", "slack"

Model Selection: The Biggest Lever

The model you use is the single largest cost driver. The cost differences are dramatic:

Model	Input Cost (per 1M tokens)	Output Cost	Relative Cost
GPT-4o	~$5	~$15	1x (baseline)
Claude Sonnet	~$3	~$15	~0.8x
GPT-4o-mini	~$0.15	~$0.60	~0.04x
Gemini 2.0 Flash	~$0.075	~$0.30	~0.02x
Claude Haiku	~$0.25	~$1.25	~0.05x
Local (Ollama/LM Studio)	$0	$0	0x

Switching from GPT-4o to GPT-4o-mini for everyday messages reduces API costs by ~96% on those messages, with only modest quality reduction for routine tasks.

Smart Routing by Task Type

llm:
  routing:
    - pattern: "^(analyse|research|compare|review|explain in depth|write long)"
      model: "gpt-4o"           # Premium for demanding tasks
    - pattern: "^(remind|check|what time|quick|summarise briefly)"
      model: "gpt-4o-mini"      # Budget for quick tasks
    - default:
      model: "gemini-2.0-flash" # Cheapest for everything else

Reducing Context Size

Every token of context costs money. Trimming what gets sent per request reduces costs without changing which model you use.

SOUL.md Optimisation

Your SOUL.md is sent with every request. A 1000-token SOUL.md adds $0.005 to every GPT-4o call. At 50 messages/day: ~$7.50/month just for the SOUL.md.

Audit your SOUL.md for:

Redundant instructions ("be helpful" — already the default)
Instructions that only apply to rare situations (move to per-message)
Verbose phrasing that can be shortened

A well-tuned SOUL.md of 200–400 tokens costs $1.50–3/month vs $7.50+ for a bloated 1000-token version.

Memory Context Limits

memory:
  max_context_tokens: 500    # Limit memory snippets per request
  relevance_threshold: 0.7   # Only include highly relevant memories
  max_memories_per_request: 5

Reducing from unlimited memory context to a 500-token cap can cut 30–50% off token costs, with minimal practical impact on response quality.

Context Window Trimming

For long conversations, OpenClaw maintains a rolling context window. Configure the window size:

llm:
  context_window:
    max_conversation_turns: 10    # Only last 10 turns in context
    max_tokens: 4000              # Hard token limit on conversation history
    trimming_strategy: "sliding"  # Keep the most recent turns

Budget Controls

Per-Provider Monthly Limits

providers:
  openai:
    api_key: "sk-..."
    monthly_token_budget: 1000000    # 1M tokens/month
    warn_at_percent: 80              # Alert at 80% usage
    action_at_limit: "fallback"      # Switch to fallback provider
    fallback_provider: "gemini"

  gemini:
    api_key: "AIza-..."
    monthly_token_budget: 5000000    # Generous Gemini budget

Also Set Limits at the Provider Level

OpenClaw's internal limits are a safety net, but always set spending limits directly in:

OpenAI: Platform → Billing → Usage limits
Anthropic: Console → Billing → Set limit
Google: Cloud Console → Billing → Budgets & alerts

A dual-layer limit means even if OpenClaw's config has a bug, your API provider won't charge you beyond your set cap.

Local Models: Zero API Cost

For users with adequate hardware, running local models via Ollama or LM Studio eliminates API costs entirely:

providers:
  ollama:
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"
    default_model: "llama3.1:8b-instruct-q4_K_M"

Use a hybrid approach — local for routine tasks, cloud LLM for complex ones:

llm:
  routing:
    - pattern: "^(analyse|research|code review|draft)"
      provider: "anthropic"
      model: "claude-sonnet-4-5"
    - default:
      provider: "ollama"
      model: "llama3.1:8b-instruct-q4_K_M"

This gives you zero-cost everyday operation with premium quality on demand.

Caching Repeated Requests

For automations that repeatedly ask similar questions (daily briefings, status checks), enable response caching:

llm:
  cache:
    enabled: true
    ttl: 3600          # Cache responses for 1 hour
    max_size: 100      # Store up to 100 cached responses
    cache_threshold: 0.95   # Only cache near-identical requests

A cached response costs zero tokens. For scheduled tasks that run hourly, this can reduce costs by 80%+ on those tasks.

Monthly Cost Targets by Profile

User Type	Daily Messages	Recommended Setup	Monthly Cost
Light personal	10–20	Gemini Flash default	<$1
Active personal	30–50	Gemini Flash + GPT-4o for complex	$3–10
Heavy personal	100+	GPT-4o-mini default, GPT-4o for complex	$10–25
Local-first	Any	Ollama default, cloud fallback	$0–5
Developer/power	200+	Claude Sonnet default	$20–50

Related reading:

Understanding Where Tokens Are Spent

Every OpenClaw request sends:

System prompt — Your SOUL.md content (constant per request)
Memory context — Relevant snippets from your conversation history
Integration context — Data from connected tools (if relevant)
Your message — The actual input
Response — The AI's reply (output tokens)

For a typical request:

System prompt: 200–600 tokens
Memory context: 200–1000 tokens
Your message: 20–200 tokens
Response: 100–500 tokens

Total: ~~600–2300 tokens per request. At GPT-4o pricing (~~$5/M input, $15/M output), that's roughly $0.01–0.04 per message.

At 50 messages/day: $15–60/month on GPT-4o. At 50 messages/day on Gemini Flash: $0.15–0.50/month.

Monitoring Current Usage

# View token usage by day
openclaw stats --period 7d

# View usage by model
openclaw stats --by-model

# View top token-consuming conversations
openclaw stats --top-conversations

Set up a weekly usage report:

scheduled_tasks:
  - name: "weekly-cost-report"
    cron: "0 9 * * 1"    # Monday 9am
    action: "internal"
    function: "generate_usage_report"
    format: "weekly"
    send_to: "whatsapp"   # or "telegram", "slack"

Model Selection: The Biggest Lever

The model you use is the single largest cost driver. The cost differences are dramatic:

Model	Input Cost (per 1M tokens)	Output Cost	Relative Cost
GPT-4o	~$5	~$15	1x (baseline)
Claude Sonnet	~$3	~$15	~0.8x
GPT-4o-mini	~$0.15	~$0.60	~0.04x
Gemini 2.0 Flash	~$0.075	~$0.30	~0.02x
Claude Haiku	~$0.25	~$1.25	~0.05x
Local (Ollama/LM Studio)	$0	$0	0x

Switching from GPT-4o to GPT-4o-mini for everyday messages reduces API costs by ~96% on those messages, with only modest quality reduction for routine tasks.

Smart Routing by Task Type

llm:
  routing:
    - pattern: "^(analyse|research|compare|review|explain in depth|write long)"
      model: "gpt-4o"           # Premium for demanding tasks
    - pattern: "^(remind|check|what time|quick|summarise briefly)"
      model: "gpt-4o-mini"      # Budget for quick tasks
    - default:
      model: "gemini-2.0-flash" # Cheapest for everything else

Reducing Context Size

Every token of context costs money. Trimming what gets sent per request reduces costs without changing which model you use.

SOUL.md Optimisation

Your SOUL.md is sent with every request. A 1000-token SOUL.md adds $0.005 to every GPT-4o call. At 50 messages/day: ~$7.50/month just for the SOUL.md.

Audit your SOUL.md for:

Redundant instructions ("be helpful" — already the default)
Instructions that only apply to rare situations (move to per-message)
Verbose phrasing that can be shortened

A well-tuned SOUL.md of 200–400 tokens costs $1.50–3/month vs $7.50+ for a bloated 1000-token version.

Memory Context Limits

memory:
  max_context_tokens: 500    # Limit memory snippets per request
  relevance_threshold: 0.7   # Only include highly relevant memories
  max_memories_per_request: 5

Reducing from unlimited memory context to a 500-token cap can cut 30–50% off token costs, with minimal practical impact on response quality.

Context Window Trimming

For long conversations, OpenClaw maintains a rolling context window. Configure the window size:

llm:
  context_window:
    max_conversation_turns: 10    # Only last 10 turns in context
    max_tokens: 4000              # Hard token limit on conversation history
    trimming_strategy: "sliding"  # Keep the most recent turns

Budget Controls

Per-Provider Monthly Limits

providers:
  openai:
    api_key: "sk-..."
    monthly_token_budget: 1000000    # 1M tokens/month
    warn_at_percent: 80              # Alert at 80% usage
    action_at_limit: "fallback"      # Switch to fallback provider
    fallback_provider: "gemini"

  gemini:
    api_key: "AIza-..."
    monthly_token_budget: 5000000    # Generous Gemini budget

Also Set Limits at the Provider Level

OpenClaw's internal limits are a safety net, but always set spending limits directly in:

OpenAI: Platform → Billing → Usage limits
Anthropic: Console → Billing → Set limit
Google: Cloud Console → Billing → Budgets & alerts

A dual-layer limit means even if OpenClaw's config has a bug, your API provider won't charge you beyond your set cap.

Local Models: Zero API Cost

For users with adequate hardware, running local models via Ollama or LM Studio eliminates API costs entirely:

providers:
  ollama:
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"
    default_model: "llama3.1:8b-instruct-q4_K_M"

Use a hybrid approach — local for routine tasks, cloud LLM for complex ones:

llm:
  routing:
    - pattern: "^(analyse|research|code review|draft)"
      provider: "anthropic"
      model: "claude-sonnet-4-5"
    - default:
      provider: "ollama"
      model: "llama3.1:8b-instruct-q4_K_M"

This gives you zero-cost everyday operation with premium quality on demand.

Caching Repeated Requests

For automations that repeatedly ask similar questions (daily briefings, status checks), enable response caching:

llm:
  cache:
    enabled: true
    ttl: 3600          # Cache responses for 1 hour
    max_size: 100      # Store up to 100 cached responses
    cache_threshold: 0.95   # Only cache near-identical requests

A cached response costs zero tokens. For scheduled tasks that run hourly, this can reduce costs by 80%+ on those tasks.

Monthly Cost Targets by Profile

User Type	Daily Messages	Recommended Setup	Monthly Cost
Light personal	10–20	Gemini Flash default	<$1
Active personal	30–50	Gemini Flash + GPT-4o for complex	$3–10
Heavy personal	100+	GPT-4o-mini default, GPT-4o for complex	$10–25
Local-first	Any	Ollama default, cloud fallback	$0–5
Developer/power	200+	Claude Sonnet default	$20–50

Related reading:

Managing OpenClaw Token Usage and Keeping API Costs Down

Understanding Where Tokens Are Spent

Monitoring Current Usage

Model Selection: The Biggest Lever

Smart Routing by Task Type

Reducing Context Size

SOUL.md Optimisation

Memory Context Limits

Context Window Trimming

Budget Controls

Per-Provider Monthly Limits

Also Set Limits at the Provider Level

Local Models: Zero API Cost

Caching Repeated Requests

Monthly Cost Targets by Profile

Related articles

Best Prompts for OpenClaw: Templates That Actually Work

Get Better Results from OpenClaw: Prompting Strategies

OpenClaw Browser Relay: What It Is and How to Set It Up

Managing OpenClaw Token Usage and Keeping API Costs Down

Understanding Where Tokens Are Spent

Monitoring Current Usage

Model Selection: The Biggest Lever

Smart Routing by Task Type

Reducing Context Size

SOUL.md Optimisation

Memory Context Limits

Context Window Trimming

Budget Controls

Per-Provider Monthly Limits

Also Set Limits at the Provider Level

Local Models: Zero API Cost

Caching Repeated Requests

Monthly Cost Targets by Profile

Related articles

Best Prompts for OpenClaw: Templates That Actually Work

Get Better Results from OpenClaw: Prompting Strategies

OpenClaw Browser Relay: What It Is and How to Set It Up