Advanced topics

11 min read

Local Models

Run OpenClaw on your own hardware with Ollama, LM Studio, or another local server. The upside is real. So are the tradeoffs. Here is the version without wishful thinking.

Running a local model feels a bit like owning a workshop generator. You get privacy, control, and freedom from the grid. You also inherit the noise, the maintenance, and the hard limit of what your hardware can actually power.

That tradeoff matters in OpenClaw more than in a simple chat app. An agent turn is not just a question and an answer. It carries system rules, tool schemas, memory, channel context, and sometimes images or documents. A weak local model does not just get slower. It gets confused.

This guide covers when local models make sense, which setup paths OpenClaw supports, and the hybrid patterns that save most people from self-inflicted pain.

What local models are good at

Local models are strongest when you care about one or more of these:

  • Privacy: sensitive prompts and documents stay on your hardware
  • Predictable cost: no per-token bill for every long conversation
  • Low-latency local loops: great for desk-side tools, home servers, and internal workflows
  • Control: you choose the model, the server, the timeout, and the fallback plan

There is a familiar pattern here. Early teams used to run their own mail servers because control mattered more than convenience. Then cloud convenience won for the average case. Local models are the same story in reverse. If your data, jurisdiction, or offline requirements matter enough, owning the stack is worth it.

The honest downside

OpenClaw's own local-model docs are unusually direct: small cards and aggressively quantized models often truncate context and weaken prompt-injection defenses. Official guidance recommends at least a 64k context window for local use, and the higher-end local-model guide goes further by warning that serious setups need much more headroom than hobbyists expect.

In plain English: local can be excellent, but only if the model is strong enough for agent work.

  • Small models struggle with tool use: they may describe a tool call instead of making one
  • Long prompts expose weak context handling: instructions get dropped, repeated, or mangled
  • Vision and PDF work add pressure: multimodal turns need more memory and better model support
  • Troubleshooting shifts to you: timeouts, GPU memory, model loading, and proxy compatibility become your problem

Recommended setup paths

OpenClaw currently points most users toward two local-first options: Ollama and LM Studio.

Option 1: Ollama

Ollama is the practical choice if you want a clean CLI flow, remote host support, and easy model pulls. OpenClaw integrates with Ollama's native API and explicitly warns against using the OpenAI-compatible /v1 URL for Ollama because tool calling breaks there.

# Start OpenClaw onboarding
openclaw onboard

# Choose Ollama, then local-only or hybrid mode
# Later, inspect the available models
openclaw models list --provider ollama

According to the current provider docs, OpenClaw can use Ollama in three modes: cloud only, local only, or a hybrid cloud plus local setup through a reachable Ollama host. That hybrid option is more useful than it sounds because it lets you keep one operational path while mixing hosted and local models.

Option 2: LM Studio

LM Studio is friendlier if you want a GUI, Apple Silicon support, or a simpler way to inspect loaded models. OpenClaw's current docs describe it as the best low-friction local stack for many users, especially when you want a large model behind a local server.

# Start LM Studio's local server first
lms server start --port 1234

# Then onboard OpenClaw
openclaw onboard

# Set a specific LM Studio model later if needed
openclaw models set lmstudio/qwen/qwen3.5-9b

LM Studio is also easier to reason about if you think visually. Is the model loaded? Is the server up? Is the endpoint returning models? You can usually answer those questions without spelunking through logs.

Option 3: Your own OpenAI-compatible local server

vLLM, SGLang, llama.cpp, MLX, LiteLLM, and similar proxies can work too. This route gives you the most flexibility and the most ways to waste an afternoon. Use it when you already know why you need it.

{
  agents: {
    defaults: {
      model: { primary: "local/my-local-model" },
    },
  },
  models: {
    mode: "merge",
    providers: {
      local: {
        baseUrl: "http://127.0.0.1:8000/v1",
        apiKey: "sk-local",
        api: "openai-completions",
      },
    },
  },
}

The smartest pattern for most people: hybrid routing

If you are tempted to go all-local on day one, slow down. The better pattern is usually hosted primary, local fallback.

  • Use a strong hosted model for complex reasoning, long planning, and tool-heavy turns
  • Use your local model for private drafts, internal notes, or low-risk repetitive work
  • Keep models.mode set to "merge" so local and hosted providers can coexist

This is basically kaizen applied to infrastructure: improve the system in small, reversible steps. Do not throw away the reliable path before the private path proves itself.

{
  agents: {
    defaults: {
      model: {
        primary: "anthropic/claude-sonnet-4-6",
        fallbacks: ["lmstudio/my-local-model", "anthropic/claude-opus-4-6"],
      },
    },
  },
  models: {
    mode: "merge",
  },
}

What recent OpenClaw updates tell you

Recent releases quietly show where local setups tend to hurt. The OpenClaw 2026.4.15 breakdown introduced an experimental localModelLean mode that drops heavyweight default tools for weaker local-model setups. Translation: prompt size itself was becoming a problem in the real world.

The 2026.4.26 release also shipped Ollama memory-search improvements, including model-specific retrieval query prefixes for several embedding models. That is a useful signal. Local stacks are getting better, but they still reward careful tuning instead of blind optimism.

Troubleshooting patterns that show up a lot

Problem: the model answers, but tools never actually run

This usually means the backend is compatible enough for text, but not for structured tool calls. With Ollama, check that you are using the native API endpoint and not /v1. With custom proxies, verify the chat template and tool-call support.

Problem: local turns time out

Increase provider-level timeouts before you start stretching global agent timeouts. Slow local inference is normal. Broken routing is not.

Problem: the model forgets instructions halfway through

That is usually a context-window or model-quality problem, not a clever-prompt problem. Move to a larger model, reduce attached tool load, or keep the local model as fallback only.

Problem: WSL2 or GPU setup becomes unstable

OpenClaw's local-model and Ollama docs both call out WSL2 pain points, including restart loops and memory pinning on some NVIDIA plus CUDA setups. If your machine starts acting haunted, believe the boring explanation first. It is usually driver, service, or model autoload behavior.

When local models are the right call

  • You handle private documents or internal data that should stay in-house
  • You want predictable operating costs for long-running workflows
  • You have enough hardware for real agent turns, not just short chats
  • You are comfortable owning the operational complexity

When cloud models are still the better choice

  • You need the best reasoning quality with the least setup friction
  • You rely heavily on browser, PDF, or multimodal workflows
  • You do not want to debug model servers and GPU memory at 11:40 PM
  • You care more about uptime than sovereignty

FAQ

Can I run OpenClaw fully offline with local models?

Mostly yes, if your model server, memory setup, and tools stay local. The catch is that many useful workflows still depend on outside services like messaging channels, web search, or cloud APIs. Local models protect the model layer, not every connected tool.

Which is easier for beginners: Ollama or LM Studio?

Ollama is usually the faster path for terminal-first users and remote boxes. LM Studio is friendlier if you want a GUI and an easier way to inspect what model is loaded. Both work with OpenClaw.

Do small local models work well in OpenClaw?

For light chats and narrow workflows, sometimes. For long prompts, tool-heavy turns, and prompt-injection resistance, small or heavily quantized models fall apart faster than people expect. Bigger context and stronger models matter here.

Should local models be my primary model or fallback?

Start with hosted primary and local fallback unless privacy is your top requirement. That gives you reliability first, then cost control and privacy where it helps most.

Why does OpenClaw documentation keep warning about context window size?

Because OpenClaw is not a tiny one-shot prompt. It carries instructions, tools, memory, and user context. Official docs recommend at least 64k context, and the local-model guide is even more blunt that weak setups will truncate context and behave unsafely.

Summary

Local models in OpenClaw are not a gimmick. They are a serious option for privacy-first and cost-aware operators. But they reward honesty. Bigger context wins. Stronger models win. Hybrid routing wins most of all.

Start with Ollama or LM Studio, keep hosted fallbacks available, and treat local-first as an operational discipline rather than a purity test.

Need help from people who already use this stuff?

Want a local-model setup that does not turn into a weekend project?

Join My AI Agent Profit Lab for working configs, hardware notes, and real examples from operators running OpenClaw beyond the default cloud path.