Cost management

11 min read

Keeping OpenClaw Costs Under Control

Practical strategies to reduce your OpenClaw bill by 50-90%. From model selection to context management, learn the knobs to turn and when to turn them.

OpenClaw itself is free under the MIT license. But running it incurs costs from AI model API calls, server infrastructure, and optional features like TTS and image generation. This guide shows you how to keep those costs under control without sacrificing functionality.

Where Your Money Goes

Understanding the cost structure is the first step to optimization:

  • LLM API calls: The primary cost driver. Every request includes your message plus full conversation history, memory content, tool definitions, and system prompts.
  • Hidden costs: TTS, image generation, web search API calls, and embeddings for the memory system all add up.
  • Continuous operation: Frequent cron jobs and long conversational sessions can lead to significant monthly API costs.
  • Multi-agent coordination: Context duplication across specialist agents increases token consumption.

Model Selection and Routing

This is the most impactful cost-saving measure:

Switch to Cheaper Models

Avoid using expensive models like Claude Opus or GPT-4o for routine tasks. Switch to cheaper, faster models:

  • Claude Haiku: Fast, affordable, good for simple tasks
  • GPT-4o-mini: Cost-effective for basic operations
  • Gemini Flash: Often free or very low cost

This alone can reduce API costs by 50-80% for 80% of daily use cases.

Intelligent Model Routing

Implement a tiered approach where OpenClaw automatically selects the appropriate model:

  • Simple tasks (status checks, basic questions, formatting): Route to cheap models
  • Complex reasoning: Escalate to premium models only when needed
  • Sub-agents: Always specify the cheapest appropriate model for their specific tasks

Failover Chains

Configure failover chains where cheaper models are attempted first:

model: gpt-4o-mini
fallback:
  - model: claude-haiku
  - model: gpt-4o  # Only if cheaper options fail

Context and Prompt Management

Long sessions resend the entire conversation history with each turn, leading to exponential cost growth:

Shorten System Prompts

Your SOUL.md, USER.md, and other context files are sent with every message. Keep them concise:

  • Aim for under 500 words total for all context files
  • Archive older, non-essential information
  • Treat memory like RAM, not long-term storage

Enable QMD (Quick Memory Database)

OpenClaw v2026.2.2+ includes QMD for semantic search-based context retrieval:

  • Instead of sending full history, QMD searches for and sends only relevant snippets
  • This can save 60-97% on history context tokens
  • Enable in your gateway configuration

Session Compaction

Regularly reset or compact sessions to prevent excessive token accumulation:

  • Use /new to start fresh sessions before heavy tasks
  • Enable safeguard compaction mode for proactive chunked summarization
  • Set reserveTokensFloor to prevent context-limit errors that cause costly retries

Feature and Resource Management

Audit Cron Jobs

Cron jobs run regardless of active use. Optimize them:

  • Reduce heartbeat frequency
  • Route heartbeats to the cheapest available model
  • Remove unnecessary scheduled tasks

Disable Unused Features

Turn off features you are not actively using:

  • Text-to-Speech (TTS) if voice is not required
  • Speech-to-Text (STT)
  • Image generation

Tool Definition Optimization

Tool definitions are sent with every prompt. Use per-agent tool allowlists to ensure agents only have access to the tools they need:

# Agent focused on email doesn't need calendar schema
agent:
  name: email-assistant
  tools:
    - email-send
    - email-read
    # Don't include: calendar-* , file-manager-*

Infrastructure Optimization

Right-size Your Server

Your server specs directly influence your monthly bill:

  • Light personal use: 1-2 vCPU, 2-4 GB RAM
  • Small teams: 2-4 vCPU, 8 GB RAM
  • Avoid overpaying for unnecessary resources

Free and Low-Cost Hosting Options

  • Free-tier cloud: Oracle Cloud free tier + Gemini free tier = $0/month
  • Budget VPS: $5-10/month for personal projects
  • Self-hosting: Run on existing hardware (Mac Mini, old PC) to eliminate hosting fees
  • Raspberry Pi: ~$80 for the board + $1/month electricity for cloud API usage

Monitoring and Limits

Set Hard Limits

Establish spend limits per agent to prevent unexpected bill spikes:

  • Configure spending alerts with your AI provider
  • Set hard limits directly in provider dashboards
  • Use "pause" setting when limits are reached (preferable to hard stop)

Monitor Usage

Regularly audit provider logs to understand which models are handling requests:

  • Use gateway logs and dashboard metrics
  • Check provider dashboards for token consumption
  • Identify areas of high cost

Advanced Optimizations

Semantic Caching

For repetitive calls, semantic caching can significantly reduce costs:

  • Frequent heartbeat checks can be cut by 70-90%
  • Cache agents remember previous responses for similar queries

Proxy Services

Services like laozhang.ai provide more stable connections and may offer access to domestic models at lower prices, reducing extra token consumption from retries and timeouts.

FAQ

What is the biggest cost driver in OpenClaw?

AI model API calls are the largest expense. Every message includes your input, conversation history, memory content, tool definitions, and system prompts. Long sessions multiply costs quickly.

How can I reduce API costs by 50-80%?

Switch from expensive models (Claude Opus, GPT-4o) to cheaper alternatives (Claude Haiku, GPT-4o-mini) for routine tasks. Route complex reasoning to premium models only when needed.

What is prompt caching?

Prompt caching (supported by Anthropic models) recognizes and serves cached versions of frequently repeated content like system prompts and tool schemas. This can save up to 90% on cached input tokens.

How does QMD reduce costs?

Quick Memory Database (QMD) uses local semantic search to retrieve only relevant history snippets instead of sending the full conversation. This can save 60-97% on history context tokens.

Can I run OpenClaw for free?

Yes, using free-tier servers (like Oracle Cloud) combined with free AI model tiers (like Gemini free tier) can achieve $0/month costs for personal use.

Need help from people who already use this stuff?

Cutting costs too much?

Join My AI Agent Profit Lab: discussions about cost optimization, model routing, and budget management from people who do this daily.