Our AI-powered email monitor was burning through $100/month in API costs. After one debugging session: $2/month. Here's the fix.
The Problem
The monitor processed 400+ tickets every 5 minutes. Four AI functions—all using Claude Opus (the most expensive model). Every ticket, every run, full AI analysis.
At $15 per million input tokens and $75 per million output tokens, costs added up fast. Most of the processing was simple binary classification: "Is this a new ticket or an update?"
The Fix
Two changes. That's it.
# Before: Opus for everything ($15/$75 per million tokens)
model = "claude-opus-4"
# After: Haiku for simple tasks ($0.25/$1.25 per million)
model = "claude-3-5-haiku"
# Plus: Cache already-processed tickets
if ticket_id in processed_cache:
return cached_result
Change 1: Model selection. Simple binary classifications don't need the most powerful model. Haiku is 60x cheaper and handles yes/no decisions just as accurately.
Change 2: Caching. The monitor was re-analyzing tickets it had already processed. A simple in-memory cache eliminated 80% of API calls entirely.
Results
Key Lessons
- Not every task needs the biggest model. Simple yes/no classifications work fine with smaller models.
- Cache before calling. If you've already done the work, don't do it again.
- Check your logs. The fix was obvious once we saw the call patterns.
The takeaway: before optimizing prompts or architectures, check if you're using the right model for each task. Most LLM cost problems are model selection problems.