OpenClaw Model Routing - Choosing & Routing Models

Not all AI tasks are equal. A simple greeting needs less cognitive power than a complex code review. A quick fact check differs from architectural planning. Yet many users pay premium prices for every interaction, burning through API budgets on tasks that cheaper models handle just as well.

Model routing solves this. It is the art of matching each task to the right model at the right price. OpenClaw gives you sophisticated tools for this: primary and thinking models, multi-tier configurations, and automatic escalation. This guide shows you how to use them to cut costs by 50-90% without sacrificing quality where it matters.

Understanding Model Tiers

Modern AI providers offer models at different price and capability points. Understanding these tiers is essential for effective routing.

Tier 1: Fast and Cheap

These models handle routine work efficiently:

Claude Haiku: Fast responses, good for simple Q&A, formatting, basic extraction
GPT-3.5 Turbo: Reliable general-purpose model at low cost
GLM-4.7 Flash: Open model with excellent speed-to-cost ratio
MiniMax M2.5: Cost-effective for straightforward tasks

Use for: FAQs, simple lookups, formatting, summarization of short texts, basic code completion.

Tier 2: Capable and Balanced

The workhorses for most production tasks:

Claude Sonnet 4.6: Excellent reasoning at moderate cost
GPT-4: Strong general capabilities, good for mixed workloads
GPT-4.5: Improved reasoning and instruction following
DeepSeek V3: Strong open model with competitive pricing

Use for: Complex queries, multi-step tasks, code review, content creation, analysis.

Tier 3: Maximum Capability

Reserve these for the hardest problems:

Claude Opus 4.6: Best-in-class reasoning for complex problems
GPT-4 Turbo: Largest context, strongest performance
GPT-5: Cutting-edge capabilities for demanding tasks

Use for: Complex debugging, architectural decisions, deep research, creative writing, trickiest reasoning.

Basic Configuration

OpenClaw's model configuration lives in your openclaw.json file. Here is how to set it up.

Single Model Setup

The simplest configuration uses one model for everything:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6-20260514"
      }
    }
  }
}

This works fine for development and testing. For production, you want more sophisticated routing.

Two-Tier Setup: Primary + Thinking

The most common production configuration separates routine work from deep reasoning:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6-20260514",
        "thinking": "anthropic/claude-opus-4-6-20260514"
      }
    }
  }
}

With this setup:

Routine tasks use Sonnet (cheaper, faster)
Complex reasoning automatically escalates to Opus (more capable, more expensive)
You pay premium prices only when the task demands it

How Automatic Escalation Works

OpenClaw can automatically detect when to use the thinking model. Understanding this helps you optimize your routing.

Signals for Escalation

OpenClaw monitors these indicators:

Task complexity keywords: "debug," "architect," "design," "analyze deeply," "explain why"
Multi-step indicators: Lists of tasks, "and then," "after that," sequences
Code complexity: Multiple files, refactoring requests, algorithm design
Context length: Very long prompts that suggest complex analysis
Error recovery: Failed attempts that need deeper investigation

Manual Override

You can force thinking mode with trigger phrases:

"Use thinking mode"
"Think deeply about this"
"/thinking on"

Or switch models mid-session:

# Switch to thinking model for this task
/model thinking

# Switch back to primary
/model primary

Advanced Multi-Tier Routing

For sophisticated cost optimization, configure multiple tiers with explicit routing rules.

Three-Tier Configuration

{
  "agents": {
    "defaults": {
      "models": {
        "catalog": [
          "anthropic/claude-haiku-4-6-20260514",
          "anthropic/claude-sonnet-4-6-20260514",
          "anthropic/claude-opus-4-6-20260514",
          "openai/gpt-3.5-turbo",
          "openai/gpt-4"
        ],
        "routing": {
          "tier1": {
            "models": ["anthropic/claude-haiku-4-6-20260514", "openai/gpt-3.5-turbo"],
            "triggers": ["greeting", "simple_query", "format", "summarize_short"]
          },
          "tier2": {
            "models": ["anthropic/claude-sonnet-4-6-20260514", "openai/gpt-4"],
            "triggers": ["code_review", "analysis", "writing", "multi_step"]
          },
          "tier3": {
            "models": ["anthropic/claude-opus-4-6-20260514"],
            "triggers": ["debug", "architect", "complex_reasoning", "creative"]
          }
        }
      }
    }
  }
}

This configuration creates explicit routing rules based on task classification.

Routing Strategies

Different workflows benefit from different routing approaches.

Strategy 1: Simple Two-Tier

Best for: Most users getting started with routing

Setup: Primary model + thinking model

Logic: Automatic escalation on complexity signals

Savings: Typically 40-60%

Strategy 2: Cost-First

Best for: High-volume, cost-sensitive applications

Setup: Aggressive use of Tier 1 models, minimal Tier 3

Logic: Only escalate when Tier 1 fails or user explicitly requests

Savings: Up to 80-90%

Trade-off: Some complex tasks may need manual retry with better models

Strategy 3: Quality-First

Best for: Critical applications where errors are expensive

Setup: Conservative Tier 1 usage, generous Tier 3 escalation

Logic: Escalate early and often

Savings: 20-40%

Benefit: Maximum quality, fewer errors

Strategy 4: Workflow-Based

Best for: Multi-step processes with known requirements

Setup: Different models for different workflow stages

Logic: Data extraction → cheap model; Analysis → medium; Final review → premium

Savings: 50-70%

Per-Agent Model Configuration

Different agents in your setup can use different models. This enables sophisticated specialization.

Example: Specialized Agents

{
  "agents": {
    "entries": [
      {
        "name": "quick-helper",
        "model": {
          "primary": "anthropic/claude-haiku-4-6-20260514"
        },
        "description": "Fast, cheap responses for simple queries"
      },
      {
        "name": "code-reviewer",
        "model": {
          "primary": "anthropic/claude-sonnet-4-6-20260514",
          "thinking": "anthropic/claude-opus-4-6-20260514"
        },
        "description": "Code review with escalation for complex issues"
      },
      {
        "name": "architect",
        "model": {
          "primary": "anthropic/claude-opus-4-6-20260514"
        },
        "description": "Always premium for architectural decisions"
      }
    ]
  }
}

Route messages to the appropriate agent based on intent, and each gets the right model for its job.

Subagent Model Routing

Subagents inherit parent model settings by default, but you can override per subagent:

// Parent uses expensive model, subagent uses cheap one
const runId = await sessions_spawn({
  task: "Summarize these 50 articles",
  model: "anthropic/claude-haiku-4-6-20260514"  // Cheap for simple task
});

// Multiple subagents with different models
const researchRun = await sessions_spawn({
  task: "Research this topic deeply",
  model: "anthropic/claude-sonnet-4-6-20260514"
});

const draftRun = await sessions_spawn({
  task: "Write first draft",
  model: "anthropic/claude-sonnet-4-6-20260514"
});

const polishRun = await sessions_spawn({
  task: "Polish and finalize",
  model: "anthropic/claude-opus-4-6-20260514"  // Premium for final quality
});

This approach gives you fine-grained control over model selection for each step of a workflow.

Measuring and Optimizing

You cannot optimize what you do not measure. Track these metrics:

Key Metrics

Cost per conversation: Average API spend per session
Model distribution: Percentage of requests to each tier
Escalation rate: How often tasks escalate to higher tiers
Quality scores: User satisfaction or error rates by tier
Response times: Latency by model tier

Optimization Cycle

Baseline: Run with current routing for a week
Analyze: Identify expensive tasks that might use cheaper models
Adjust: Tune routing rules or triggers
Validate: Check that quality did not suffer
Repeat: Iterate monthly

Common Routing Mistakes

Avoid these pitfalls:

Over-Escalation

Sending simple tasks to expensive models wastes money. If Tier 1 handles 90% of tasks well, use it for 90% of tasks. Do not default to premium "just to be safe."

Under-Escalation

Refusing to escalate when tasks actually need deep reasoning produces poor results and frustrated users. Balance cost savings with quality requirements.

Ignoring Context Costs

Long conversations become expensive even with cheap models. Implement context window management: summarize old context, truncate where appropriate, or start fresh sessions.

Static Configuration

Model capabilities and prices change. Review your routing configuration quarterly. New cheaper models may handle tasks previously requiring premium tiers.

Provider-Specific Tips

Different providers have different strengths:

Anthropic (Claude)

Haiku: Extremely fast, great for simple tasks
Sonnet: Best cost-performance ratio for most tasks
Opus: Unmatched reasoning for complex problems
Thinking feature: Built-in reasoning escalation

OpenAI (GPT)

GPT-3.5: Reliable, widely compatible
GPT-4: Strong all-around performance
GPT-4 Turbo: Best for very long contexts
Function calling: Excellent for tool use

Open Models

GLM-4.7 Flash: Excellent speed, competitive with GPT-3.5
DeepSeek V3: Strong reasoning, lower cost than Claude/GPT
Qwen 2.5: Good for coding tasks
Llama 3: Flexible, self-hostable for maximum control

FAQ

What is model routing in OpenClaw?

Model routing is the process of selecting which AI model handles each task. OpenClaw supports multiple routing strategies: manual switching, primary/thinking tiering, and multi-tier complexity-based routing. This lets you optimize the cost-quality trade-off for every interaction.

How much can I save with proper model routing?

Typical savings range from 50% to 90% on API costs. By using cheaper models for simple tasks and reserving expensive models for complex reasoning, you pay premium prices only when necessary. Many users cut costs by 70% or more.

What is the difference between primary and thinking models?

The primary model handles routine work (conversation, simple tasks). The thinking model activates for complex reasoning, debugging, or multi-step logic. OpenClaw can automatically escalate to the thinking model when extended reasoning is detected, keeping costs low for everyday interactions.

Can I use different providers in the same setup?

Yes. OpenClaw supports multiple providers simultaneously. You can configure Claude from Anthropic, GPT from OpenAI, and open models from other providers all in the same setup. Each agent or subagent can use a different provider.

Does model routing affect response quality?

When configured correctly, routing improves effective quality per dollar. Simple tasks get fast, cheap responses. Complex tasks get the heavy reasoning they need. The key is accurate task classification and appropriate routing rules.

Getting Started Checklist

Implement model routing step by step:

Start with primary + thinking two-tier setup
Run for a week and measure current costs
Identify your most expensive request types
Add a third tier if those requests are simple
Adjust escalation triggers based on results
Consider per-agent specializations
Review and optimize monthly

Next Steps

Model routing is a foundation for cost-effective AI operations. Combine it with other optimization strategies:

Subagents: Use cheap models for parallel subtasks
Context management: Keep prompts lean to reduce token costs
Caching: Reuse model outputs when appropriate
Prompt optimization: Shorter, clearer prompts cost less

Start simple. A basic two-tier setup delivers most of the benefits. Add complexity only when you have data showing it is needed.

Need help from people who already use this stuff?

Need help choosing or routing models?

Join My AI Agent Profit Lab for practical help, faster answers, and real-world model routing examples from the community.

Join My AI Agent Profit Lab See the community page

Choosing & Routing Models