>
Optimization

8 Token-Saving Tactics for Claude Code with Many Agents

May 2026 · 6 min read

When you stack subagents, skills, and MCP tools, the failure mode is silent. Every loaded skill, every tool schema, every agent summary chews context until you're stuck rewriting your priorities into /compact instead of doing the work.

Here are eight tactics I've validated across hundreds of sessions running Claude Code as the orchestration layer for a portfolio of iOS apps, a Slack‑Jira bot, and a stack of consulting work.

1. Treat subagents as context firewalls

A subagent gets its own context window. If you dispatch a 20‑file codebase search or a multi‑step audit to a subagent, the main session only sees the final summary — not the raw greps, reads, or dead ends.

// Bad: do the search inline
// Every grep result + file read stays in main context

// Good: dispatch to a subagent
Agent({
 description: "Find all auth middleware",
 subagent_type: "Explore",
 prompt: "Find every file that imports or calls authMiddleware. Report file:line for each. Under 100 words."
})
// Main context only sees the final report

Same logic for code review, planning, and research. Reserve your main context for synthesis and decisions, not exploration.

2. Defer tool schemas

This is the biggest single win available right now. Most platforms load every available tool's full JSONSchema up‑front. Claude Code's newer harness exposes only tool names in the system prompt; the parameter schema is fetched on demand.

// Tool name is known, schema is not loaded
// Calling it directly would fail with InputValidationError

// Load the schema first:
ToolSearch({ query: "select:mcp__slack__slack_post_message", max_results: 1 })

// Now the tool is callable

With a few dozen MCP servers connected — Atlassian, Slack, Playwright, Figma, Canva, computer-use, Sentry, monday.com — this saves a lot of tokens per turn before you've done anything. The pattern is "name index up‑front, schema on demand."

3. Skills on demand, not on load

Don't dump every skill's content into the system prompt. Use a discovery layer where each skill's index entry is one line and the body only enters context when triggered.

The skill ecosystem only works economically if loading is cheap. If your /commands directory has 80 skills and every one of them is fully expanded at session start, you're paying for content you'll never use that turn.

4. Memory as index, not archive

A MEMORY.md that's truly an index — one line per entry, under 200 characters — costs almost nothing per turn. The moment you start dumping content into it, you're paying for it on every single response.

# Good (index entry)
- [feedback/no-creds-in-saves.md](feedback/no-creds-in-saves.md) — NEVER put passwords/tokens in context saves — reference location only

# Bad (content in the index)
- Never put passwords or tokens in context saves. Last quarter I leaked
 an API key into a public Gist by accident. The rule is: reference
 the location of the secret, never the value itself. ...

Push detail to topic files and let the index just point. If your memory file is over a couple hundred lines, you're leaking budget on every turn — even the turns where memory isn't relevant.

5. Read with offset and limit

Don't read the whole 800‑line file when you need lines 200–220.

// Bad
Read({ file_path: "/path/to/huge.ts" })  // loads 800 lines

// Good
Grep({ pattern: "handleAuthCallback", path: "src/auth.ts" })
// Output: src/auth.ts:204
Read({ file_path: "src/auth.ts", offset: 195, limit: 30 })

Same logic for git log: --oneline --since="2 hours ago" beats full history every time.

6. Run parallel, not sequential

When tasks are independent, fan out subagents in a single message. Same wall‑clock cost as one, but each agent's intermediate work stays in its own context.

For three to five independent reviews, audits, or refactors, this is essentially free leverage. The harness runs them concurrently, you only pay (in tokens) for each summary that comes back.

7. Reset context between batch items

For ten‑app portfolio operations, don't try to keep all ten in one session. Use an atomic loop: pick app → implement → commit → fresh context. Each app gets a clean window.

Trying to hold the whole portfolio in one conversation is how you end up with /compact halfway through item four, losing the thread on items one through three.

8. /btw for side questions

Quick "what's the syntax for X" detours shouldn't persist in your working context. Move them to a side channel so they don't compete with the actual task for tokens.

The Pattern Under All Eight

Spend tokens where they buy reasoning, not where they buy memory.

Loading a skill you might use, a tool schema you might call, or a memory entry you might reference — those all feel free in the moment and expensive in aggregate. The harness that gets this right is the one where each cost is paid on demand, not on session‑start.

If you're running Claude Code as a one‑off coding assistant, none of this matters. If you're orchestrating multi‑agent workflows across a portfolio — sprints across many apps, batch ASO updates, Slack‑Jira routing, cross‑cutting refactors — these eight tactics are the difference between sessions that ship and sessions that stall.

Have a take on this?

Got a token-saving tactic I missed, a counter-argument, or a question about a specific workflow? Drop a comment — I read every one and reply when I can. The best ones get folded into future posts (with attribution if you want).

Comments are moderated by hand. No public thread — I’ll reply via email if you leave one. By submitting, you agree to our Privacy Policy.

Related Service

AI Classification & Workflow Automation

I build multi‑agent workflows that ship real work — not demo‑ware. From Slack‑Jira routing to portfolio‑wide app updates, the difference is knowing what to keep out of context.

Learn more →

Related Posts

98% Cost Reduction: Optimizing AI API Calls

Model selection + caching = 98% savings on AI API costs.

AI API Costs in Production

What it actually costs to run AI features at scale.

Evgeny Goncharov - Founder of TechConcepts, ex-Big 4 Advisory

Evgeny Goncharov

Founder, TechConcepts

I build automation tools and custom software for businesses. Previously at a major search platform and Big 4 Advisory. Based in Madrid.

About me LinkedIn GitHub
← All blog posts

Building with Claude Code at scale?

20 minutes. No pitch. Just honest advice on multi‑agent workflows that actually ship.

Book a 20-minute discovery call