CLAUDE BATTERY

How to Monitor and Reduce Claude API Token Usage with Claude Battery

May 2026 · 10 min read

If you're building with the Claude API, tokens are your AWS bill. A poorly designed agent loop can burn 100,000+ tokens per user query. A well-designed one with prompt caching can do the same job for 10,000 tokens — a 10x cost reduction. Claude Battery monitors your token usage across models, prompt cache, and projects so you can see where the money goes and reduce it systematically. This article covers what to track and how to optimise.

Tutorial: what Claude Battery tracks

Four token types per API call:

  • Input tokens — the prompt sent to Claude. Cost: 1x base rate. This is usually the largest token count for RAG and long-context applications.
  • Output tokens — Claude's response. Cost: 5x base rate (output is more expensive than input on all models). Short responses are cheap; long structured outputs (JSON, code) add up fast.
  • Cache creation tokens — when you mark a prompt block as cacheable. Cost: 1.25x base rate. A one-time write to the cache.
  • Cache read tokens — when a subsequent call hits the cache. Cost: 0.1x base rate. This is where prompt caching pays off — 90% discount on reused content.

Breakdown by model (typical 2026 prices, USD per 1M tokens):

  • Claude Haiku 4.5 — 1 input / 5 output. The cheap and fast tier — use for high-volume classification, extraction, routing.
  • Claude Sonnet 4.6 — 3 input / 15 output. The workhorse — most production traffic should run here.
  • Claude Opus 4.7 — 15 input / 75 output. Premium reasoning — use only when Sonnet struggles.

Breakdown by API key. Claude Battery groups usage by Anthropic API key (you can use multiple keys for different projects). The dashboard shows daily/weekly/monthly cost per key, alerts when a key approaches budget thresholds, and identifies the top 10 most expensive request signatures so you can target optimisation.

Reducing token usage: the high-impact levers

1. Prompt caching for repeated context (50-90% savings). The single biggest lever. If your application has a long system prompt, a code base loaded as context, or a multi-turn conversation, mark the static portion as cacheable. Subsequent calls only pay 0.1x for that portion. Typical agent loops with 50k token system prompts go from 5 USD per query to 0.50 USD with caching.

{
  "messages": [
    {"role": "user", "content": [
      {"type": "text", "text": "Big static context here...",
       "cache_control": {"type": "ephemeral"}},
      {"type": "text", "text": "User's actual question"}
    ]}
  ]
}
   

2. Right-size your model (30-80% savings). Most teams default to Sonnet or Opus for everything. But Haiku handles 70% of typical tasks just as well at 1/15 the cost. A common pattern: a fast Haiku classifier decides if the query needs Sonnet or Opus, then routes accordingly. Save the expensive models for the queries that actually need them.

3. Batch processing for bulk jobs (50% savings). The Batch API processes requests asynchronously with a 24-hour SLA at 50% off the per-token cost. Perfect for: overnight data labelling, bulk summarisation, historical analysis. Not useful for: real-time user queries, interactive agents.

4. Shorter outputs with structured response formats (10-30% savings). If you only need a JSON object, ask for JSON with no commentary. "Return only the JSON, no preamble." A 200-token preamble per call adds up to thousands of dollars over a million calls. Use tool-use mode for structured outputs — it's optimised for terse responses.

5. Pre-summarise long context (40-70% savings). If you're feeding 100k tokens of context per call, run an initial Haiku call to summarise it into 5-10k tokens, then pass the summary to Sonnet/Opus. Yes, you pay for the summary call — but Haiku at 1 USD/M input is far cheaper than Opus at 15 USD/M for the same 100k of context.

6. Conversation history pruning (20-40% savings on long chats). Long conversations accumulate token cost linearly. Truncate older messages once they're no longer relevant (typically keep last 10-15 turns). Better: summarise old turns into a compact memory block.

Claude Battery dashboard: what to look at

Daily spend trend. Spot anomalies fast. A 10x spike often means an infinite loop, a misconfigured cron job, or a runaway agent. Set alert thresholds at 2x your usual daily spend.

Cost per request signature. Claude Battery groups requests by approximate prompt signature (first 200 chars hash). The top 10 expensive signatures are usually the biggest optimisation opportunities — they're the requests where caching or smaller models would save most.

Cache hit rate per project. If your cache_creation_tokens are high and cache_read_tokens are low, your cache is being thrown away (too short cache lifetime, prompt changing too often, or you're hitting Anthropic's cache eviction). Aim for a cache read / cache write ratio of 5x or higher.

Model mix. If 95% of your tokens go to Opus, you're probably overpaying. A healthy mix for most products: Haiku 40%, Sonnet 50%, Opus 10%. Heavy code/reasoning products skew Sonnet 70% / Opus 30%.

Comparison: cost optimisation techniques

TechniqueSavingsImplementation effortUse case
Prompt caching50-90%LowRAG, agents, long prompts
Right-size model30-80%MediumVariable complexity workflows
Batch API50%LowAsync bulk jobs
Pre-summarise context40-70%MediumVery long context inputs

Frequently asked questions

What tokens does Claude Battery track?

Claude Battery tracks four token types per API call: input tokens (prompt), output tokens (response), cache creation tokens (1.25x cost when prompt caching is set up), and cache read tokens (0.1x cost when a cached prompt is reused). It also tracks tokens by model (Opus, Sonnet, Haiku across versions) so you can see which model consumes most of your budget.

How much do Claude tokens cost in 2026?

Approximate prices per million tokens (USD): Claude Haiku 4.5: 1 input / 5 output. Claude Sonnet 4.6: 3 input / 15 output. Claude Opus 4.7: 15 input / 75 output. Prompt caching: 1.25x for cache write, 0.1x for cache read. A typical 50k input + 5k output Sonnet call costs about 0.225 USD. Long-context calls (200k+ tokens) on Opus can hit 5-15 USD each.

How can prompt caching reduce costs?

Prompt caching stores a frequently-reused prefix (system prompt, RAG context, code base, conversation history) on Anthropic's servers. Subsequent calls that include the same prefix only pay 0.1x for the cached portion instead of 1x. Typical savings: 50-90% on agent loops, RAG applications, and long-running conversations. Cache lifetime: 5 minutes default, refreshed on each use.

Which model should I use for which task?

Haiku for high-volume simple tasks (classification, extraction, routing, simple summarisation). Sonnet for most production workflows (coding, analysis, multi-step reasoning, document Q&A). Opus only for hard reasoning, complex coding, or when Sonnet gets stuck. A common pattern: route simple queries to Haiku, complex ones to Sonnet, escalate to Opus only when Sonnet fails — this saves 60-80% vs always using Opus.

Monitor your Claude spend in real time

Claude Battery shows token usage by model, project and time. Catch runaway costs before they hit your monthly invoice.

Open Claude Battery

Related posts

Germany electricity pricesSplitting large mbox files
← All blog posts

Don't get surprised by your Claude bill

Real-time token monitoring, per-model breakdown, cache hit analysis.

Open Claude Battery