Optimization & Costs

10 min read

Usage Tracking & Cost Visibility

Most OpenClaw cost problems do not start with one giant mistake. They start with not seeing what is happening until too late. If you cannot tell which model ran, how many tokens were burned, whether caching helped, or which provider window is shrinking, you are operating blind. OpenClaw gives you the instruments. You just need to know which dial answers which question.

Running agents without usage visibility is like managing a delivery fleet with no fuel gauge, no odometer, and no map of which driver took the long way home.

You can keep moving for a while. Then the bill arrives and suddenly everyone becomes interested in instrumentation.

The official docs for concepts/usage-tracking, reference/token-use, and reference/prompt-caching lay out the important distinction: OpenClaw shows several different kinds of usage signals, and they are not interchangeable. Some tell you about the current session. Some tell you about provider quota windows. Some tell you about locally estimated cost. Mix them together and you get fake confidence.

The short version

  • /status shows the quickest live picture for the current session
  • /usage full adds per-response token detail
  • /usage cost summarizes locally tracked cost from session logs
  • openclaw status --usage shows provider usage windows, not just one chat session
  • model routing is a cost control system, not just a quality setting

If you remember one thing, remember this: not every usage number answers the same question.

Where usage visibility lives

The first mistake beginners make is hunting for one magical cost screen. OpenClaw does not work that way because it tracks different layers of reality.

/status for the current session

The usage-tracking docs position /status and session_status as the fast session snapshot. It shows the active model, context usage, recent token counts, and estimated cost when local pricing exists for that model. It can also recover counters from the latest transcript usage entry if the live snapshot is sparse.

That last detail matters more than it sounds. It means your status card can stay useful even when the runtime state is incomplete, instead of shrugging and pretending nothing happened.

/usage full for per-response detail

If /status is your dashboard, /usage full is the trip receipt. It appends usage data to each reply so you can see what a single answer cost in tokens and, when pricing is configured, estimated dollars.

This is especially helpful while tuning prompts, tools, or agent instructions. Small wording changes can shift token use a lot faster than most people expect.

/usage cost for the local money picture

/usage cost pulls from OpenClaw session logs. That makes it useful when you want the local running total instead of the latest turn only.

Think of it as your internal accounting view. It is not the same thing as the provider's quota dashboard, and that distinction saves a lot of confusion.

CLI usage views for provider windows

The docs also call out openclaw status --usage and openclaw channels list. These normalize provider windows into a human-readable X% left view when the provider exposes usage endpoints and usable auth is available.

That is not just pretty formatting. Different providers report quota in different shapes. OpenClaw smooths that mess into one readable surface so you can tell which bucket is shrinking first.

What to monitor first

You do not need a spreadsheet on day one. You do need four basic signals.

1. Session tokens

Watch input, output, and cache counters in /status or /usage full. If one answer suddenly balloons, do not start by blaming the model. Check whether the prompt, attached files, tool output, or bloated context did it.

2. Pricing coverage

The token-use docs are blunt here: cost estimates only show when model pricing is configured and usage metadata is available. No pricing, no meaningful estimate.

So before you complain that OpenClaw is hiding cost, make sure you have actually given it the numbers it needs.

3. Provider quota windows

Provider usage tracking is different from local session cost. One tells you how much provider allowance remains. The other tells you what your local logs think you spent. Both matter. They just answer different questions.

This is where operators often get tripped up. A session can look cheap locally and still chew through a scarce provider window if the plan is constrained in some other way.

4. Cache behavior

The prompt-caching docs explain why cacheRead and cacheWrite matter. Reused prompt prefixes can cut repeated work. Idle sessions that miss the cache TTL can force expensive re-caching later.

That is why cache behavior deserves a place in your cost conversation. If you ignore it, long-lived agent sessions can become needlessly expensive even when the visible prompts look stable.

Provider usage is not the same as local cost

This distinction is the heart of the whole topic.

The usage-tracking docs say provider usage comes from provider usage or quota endpoints. The token-use docs say estimated cost comes from your local pricing config plus returned usage metadata. Those are related, but they are not the same measurement path.

  • provider usage window answers: how much allowance is left?
  • local estimated cost answers: what does this activity likely cost under my configured pricing?
  • session token view answers: what happened in this chat, right now or recently?

Once you separate those layers, weird-looking dashboards stop feeling weird.

How model routing affects cost

Model routing is not just about quality. It decides which brain gets paid for which job.

If your default route points every casual turn, every tool loop, and every background check at a premium model, you are paying restaurant prices for vending-machine work.

Defaults set the floor

Your default model is the base tax on every ordinary interaction. A sensible default keeps routine turns cheap enough that you can afford to save the expensive stuff for when it matters.

Per-agent overrides control specialization

Agent-specific model choices are where cost discipline starts looking intentional. A lightweight watchdog, cron worker, or routing helper usually does not need the same model budget as a writing agent, debugger, or sales-page editor.

That sounds obvious, but many setups still run everything through one premium lane because it is easier for a weekend. Then the weekend turns into a monthly habit.

Fallbacks can quietly change spend

Status surfaces show fallback chains for a reason. If your primary model cools down, rate-limits, or becomes unavailable, the fallback model can be cheaper or more expensive than expected. If you never check the chain, your routing policy may be spending money behind your back.

Tool-heavy turns multiply model calls

The token-use docs note that provider usage accounting can include multiple tool-loop model calls. That means one "answer" may not be one model request. Planning, tool selection, tool result handling, and follow-up synthesis can all add up.

So when a workflow feels expensive, do not just ask, "Which model?" Ask, "How many times did we hit it for this one task?"

Healthy cost discipline looks boring on purpose

The best operators are not dramatic about cost. They are methodical.

Keep usage visible while tuning

Turn on /usage full while you are changing prompts, tools, or context rules. Otherwise you are editing blind and hoping the bill will explain your architecture later.

Give cheap work a cheap path

Use lighter defaults and reserve premium models for higher-stakes agents or tougher tasks. Routing is one of the biggest levers because it changes cost before any token is generated.

Keep context from swelling for no reason

Large bootstrap files, noisy tool results, and oversized attachments all eat into token spend. Cost discipline is partly a modeling question. It is also a cleanliness question.

Use caching on purpose

Prompt caching is not a magical coupon, but it is real leverage. Stable prefixes, sensible cache retention, and heartbeat timing that respects cache TTL can reduce repeated cache writes and keep long-running sessions more predictable.

Check the boring commands regularly

/status
/usage full
/usage cost
openclaw status --usage
openclaw channels list

Those commands are not glamorous. They are how you stop turning finance into folklore.

A practical mental model

Think of OpenClaw cost visibility as three stacked instruments:

  • session view tells you what this conversation is doing
  • provider usage view tells you what the upstream quota looks like
  • local cost estimation tells you what your configured pricing suggests this work is costing

If you read only one of those, you can still make decent decisions. If you read all three together, you can make disciplined ones.

The real payoff

Good usage tracking changes behavior. You start catching bad defaults, noisy prompts, oversized tool loops, and expensive fallback chains before they become normal.

That is the point. Not pretty dashboards. Fewer surprises, cleaner routing, and an OpenClaw setup that stays useful without acting like your budget is someone else's problem.

Need help from people who already use this stuff?

Want help tightening model routing, prompt weight, and usage visibility before your OpenClaw setup gets expensive by accident?

Join My AI Agent Profit Lab if you want practical help tuning cost discipline, caching, and session design without gutting what makes your agents useful.

FAQ

Where should I look first for OpenClaw usage and cost information?

Start with /status for the current session, then /usage full if you want per-response token details, and /usage cost if you want the local session cost summary from logs. For provider quota windows, use openclaw status --usage or openclaw channels list.

Does OpenClaw show exact provider billing?

Not always. The usage-tracking docs distinguish provider-reported usage windows from local estimated costs. Provider windows come from provider usage endpoints. Estimated cost depends on local model pricing being configured and usage metadata being available.

Why does model routing affect cost so much?

Because routing decides which model handles the turn, which fallback is used under pressure, and whether expensive models are reserved for the hard work instead of every casual message. One bad default can make every turn cost more than it should.

What should I monitor before I start optimizing?

Monitor four things first: current session token usage, whether pricing is configured, which provider quota window is shrinking, and whether prompt caching is helping. If you skip those basics, optimization turns into guesswork.

Can prompt caching really change OpenClaw costs?

Yes. The prompt-caching docs explain that cacheRead and cacheWrite can materially change long-running session costs, especially when system prompts and stable context are reused. Good caching does not fix everything, but it can make repeated work much cheaper.