Not all AI tasks are equal. A simple greeting needs less cognitive power than a complex code review. A quick fact check differs from architectural planning. Yet many users pay premium prices for every interaction, burning through API budgets on tasks that cheaper models handle just as well.
Model routing solves this. It is the art of matching each task to the right model at the right price. OpenClaw gives you sophisticated tools for this: primary and thinking models, multi-tier configurations, and automatic escalation. This guide shows you how to use them to cut costs by 50-90% without sacrificing quality where it matters.
Understanding Model Tiers
Modern AI providers offer models at different price and capability points. Understanding these tiers is essential for effective routing.
Tier 1: Fast and Cheap
These models handle routine work efficiently:
- Claude Haiku: Fast responses, good for simple Q&A, formatting, basic extraction
- GPT-3.5 Turbo: Reliable general-purpose model at low cost
- GLM-4.7 Flash: Open model with excellent speed-to-cost ratio
- MiniMax M2.5: Cost-effective for straightforward tasks
Use for: FAQs, simple lookups, formatting, summarization of short texts, basic code completion.
Tier 2: Capable and Balanced
The workhorses for most production tasks:
- Claude Sonnet 4.6: Excellent reasoning at moderate cost
- GPT-4: Strong general capabilities, good for mixed workloads
- GPT-4.5: Improved reasoning and instruction following
- DeepSeek V3: Strong open model with competitive pricing
Use for: Complex queries, multi-step tasks, code review, content creation, analysis.
Tier 3: Maximum Capability
Reserve these for the hardest problems:
- Claude Opus 4.6: Best-in-class reasoning for complex problems
- GPT-4 Turbo: Largest context, strongest performance
- GPT-5: Cutting-edge capabilities for demanding tasks
Use for: Complex debugging, architectural decisions, deep research, creative writing, trickiest reasoning.
Basic Configuration
OpenClaw's model configuration lives in your openclaw.json file. Here is how to set it up.
Single Model Setup
The simplest configuration uses one model for everything:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-6-20260514"
}
}
}
}This works fine for development and testing. For production, you want more sophisticated routing.
Two-Tier Setup: Primary + Thinking
The most common production configuration separates routine work from deep reasoning:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-6-20260514",
"thinking": "anthropic/claude-opus-4-6-20260514"
}
}
}
}With this setup:
- Routine tasks use Sonnet (cheaper, faster)
- Complex reasoning automatically escalates to Opus (more capable, more expensive)
- You pay premium prices only when the task demands it
How Automatic Escalation Works
OpenClaw can automatically detect when to use the thinking model. Understanding this helps you optimize your routing.
Signals for Escalation
OpenClaw monitors these indicators:
- Task complexity keywords: "debug," "architect," "design," "analyze deeply," "explain why"
- Multi-step indicators: Lists of tasks, "and then," "after that," sequences
- Code complexity: Multiple files, refactoring requests, algorithm design
- Context length: Very long prompts that suggest complex analysis
- Error recovery: Failed attempts that need deeper investigation
Manual Override
You can force thinking mode with trigger phrases:
- "Use thinking mode"
- "Think deeply about this"
- "/thinking on"
Or switch models mid-session:
# Switch to thinking model for this task
/model thinking
# Switch back to primary
/model primaryAdvanced Multi-Tier Routing
For sophisticated cost optimization, configure multiple tiers with explicit routing rules.
Three-Tier Configuration
{
"agents": {
"defaults": {
"models": {
"catalog": [
"anthropic/claude-haiku-4-6-20260514",
"anthropic/claude-sonnet-4-6-20260514",
"anthropic/claude-opus-4-6-20260514",
"openai/gpt-3.5-turbo",
"openai/gpt-4"
],
"routing": {
"tier1": {
"models": ["anthropic/claude-haiku-4-6-20260514", "openai/gpt-3.5-turbo"],
"triggers": ["greeting", "simple_query", "format", "summarize_short"]
},
"tier2": {
"models": ["anthropic/claude-sonnet-4-6-20260514", "openai/gpt-4"],
"triggers": ["code_review", "analysis", "writing", "multi_step"]
},
"tier3": {
"models": ["anthropic/claude-opus-4-6-20260514"],
"triggers": ["debug", "architect", "complex_reasoning", "creative"]
}
}
}
}
}
}This configuration creates explicit routing rules based on task classification.
Routing Strategies
Different workflows benefit from different routing approaches.
Strategy 1: Simple Two-Tier
Best for: Most users getting started with routing
Setup: Primary model + thinking model
Logic: Automatic escalation on complexity signals
Savings: Typically 40-60%
Strategy 2: Cost-First
Best for: High-volume, cost-sensitive applications
Setup: Aggressive use of Tier 1 models, minimal Tier 3
Logic: Only escalate when Tier 1 fails or user explicitly requests
Savings: Up to 80-90%
Trade-off: Some complex tasks may need manual retry with better models
Strategy 3: Quality-First
Best for: Critical applications where errors are expensive
Setup: Conservative Tier 1 usage, generous Tier 3 escalation
Logic: Escalate early and often
Savings: 20-40%
Benefit: Maximum quality, fewer errors
Strategy 4: Workflow-Based
Best for: Multi-step processes with known requirements
Setup: Different models for different workflow stages
Logic: Data extraction → cheap model; Analysis → medium; Final review → premium
Savings: 50-70%
Per-Agent Model Configuration
Different agents in your setup can use different models. This enables sophisticated specialization.
Example: Specialized Agents
{
"agents": {
"entries": [
{
"name": "quick-helper",
"model": {
"primary": "anthropic/claude-haiku-4-6-20260514"
},
"description": "Fast, cheap responses for simple queries"
},
{
"name": "code-reviewer",
"model": {
"primary": "anthropic/claude-sonnet-4-6-20260514",
"thinking": "anthropic/claude-opus-4-6-20260514"
},
"description": "Code review with escalation for complex issues"
},
{
"name": "architect",
"model": {
"primary": "anthropic/claude-opus-4-6-20260514"
},
"description": "Always premium for architectural decisions"
}
]
}
}Route messages to the appropriate agent based on intent, and each gets the right model for its job.
Subagent Model Routing
Subagents inherit parent model settings by default, but you can override per subagent:
// Parent uses expensive model, subagent uses cheap one
const runId = await sessions_spawn({
task: "Summarize these 50 articles",
model: "anthropic/claude-haiku-4-6-20260514" // Cheap for simple task
});
// Multiple subagents with different models
const researchRun = await sessions_spawn({
task: "Research this topic deeply",
model: "anthropic/claude-sonnet-4-6-20260514"
});
const draftRun = await sessions_spawn({
task: "Write first draft",
model: "anthropic/claude-sonnet-4-6-20260514"
});
const polishRun = await sessions_spawn({
task: "Polish and finalize",
model: "anthropic/claude-opus-4-6-20260514" // Premium for final quality
});This approach gives you fine-grained control over model selection for each step of a workflow.
Measuring and Optimizing
You cannot optimize what you do not measure. Track these metrics:
Key Metrics
- Cost per conversation: Average API spend per session
- Model distribution: Percentage of requests to each tier
- Escalation rate: How often tasks escalate to higher tiers
- Quality scores: User satisfaction or error rates by tier
- Response times: Latency by model tier
Optimization Cycle
- Baseline: Run with current routing for a week
- Analyze: Identify expensive tasks that might use cheaper models
- Adjust: Tune routing rules or triggers
- Validate: Check that quality did not suffer
- Repeat: Iterate monthly
Common Routing Mistakes
Avoid these pitfalls:
Over-Escalation
Sending simple tasks to expensive models wastes money. If Tier 1 handles 90% of tasks well, use it for 90% of tasks. Do not default to premium "just to be safe."
Under-Escalation
Refusing to escalate when tasks actually need deep reasoning produces poor results and frustrated users. Balance cost savings with quality requirements.
Ignoring Context Costs
Long conversations become expensive even with cheap models. Implement context window management: summarize old context, truncate where appropriate, or start fresh sessions.
Static Configuration
Model capabilities and prices change. Review your routing configuration quarterly. New cheaper models may handle tasks previously requiring premium tiers.
Provider-Specific Tips
Different providers have different strengths:
Anthropic (Claude)
- Haiku: Extremely fast, great for simple tasks
- Sonnet: Best cost-performance ratio for most tasks
- Opus: Unmatched reasoning for complex problems
- Thinking feature: Built-in reasoning escalation
OpenAI (GPT)
- GPT-3.5: Reliable, widely compatible
- GPT-4: Strong all-around performance
- GPT-4 Turbo: Best for very long contexts
- Function calling: Excellent for tool use
Open Models
- GLM-4.7 Flash: Excellent speed, competitive with GPT-3.5
- DeepSeek V3: Strong reasoning, lower cost than Claude/GPT
- Qwen 2.5: Good for coding tasks
- Llama 3: Flexible, self-hostable for maximum control
FAQ
What is model routing in OpenClaw?
Model routing is the process of selecting which AI model handles each task. OpenClaw supports multiple routing strategies: manual switching, primary/thinking tiering, and multi-tier complexity-based routing. This lets you optimize the cost-quality trade-off for every interaction.
How much can I save with proper model routing?
Typical savings range from 50% to 90% on API costs. By using cheaper models for simple tasks and reserving expensive models for complex reasoning, you pay premium prices only when necessary. Many users cut costs by 70% or more.
What is the difference between primary and thinking models?
The primary model handles routine work (conversation, simple tasks). The thinking model activates for complex reasoning, debugging, or multi-step logic. OpenClaw can automatically escalate to the thinking model when extended reasoning is detected, keeping costs low for everyday interactions.
Can I use different providers in the same setup?
Yes. OpenClaw supports multiple providers simultaneously. You can configure Claude from Anthropic, GPT from OpenAI, and open models from other providers all in the same setup. Each agent or subagent can use a different provider.
Does model routing affect response quality?
When configured correctly, routing improves effective quality per dollar. Simple tasks get fast, cheap responses. Complex tasks get the heavy reasoning they need. The key is accurate task classification and appropriate routing rules.
Getting Started Checklist
Implement model routing step by step:
- Start with primary + thinking two-tier setup
- Run for a week and measure current costs
- Identify your most expensive request types
- Add a third tier if those requests are simple
- Adjust escalation triggers based on results
- Consider per-agent specializations
- Review and optimize monthly
Next Steps
Model routing is a foundation for cost-effective AI operations. Combine it with other optimization strategies:
- Subagents: Use cheap models for parallel subtasks
- Context management: Keep prompts lean to reduce token costs
- Caching: Reuse model outputs when appropriate
- Prompt optimization: Shorter, clearer prompts cost less
Start simple. A basic two-tier setup delivers most of the benefits. Add complexity only when you have data showing it is needed.
Need help from people who already use this stuff?
Need help choosing or routing models?
Join My AI Agent Profit Lab for practical help, faster answers, and real-world model routing examples from the community.