Optimization

98% Cost Reduction: Optimizing AI API Calls

December 2025 · 3 min read

Our AI-powered email monitor was burning through $100/month in API costs. After one debugging session: $2/month. Here's the fix.

The Problem

The monitor processed 400+ tickets every 5 minutes. Four AI functions—all using Claude Opus (the most expensive model). Every ticket, every run, full AI analysis.

At $15 per million input tokens and $75 per million output tokens, costs added up fast. Most of the processing was simple binary classification: "Is this a new ticket or an update?"

The Fix

Two changes. That's it.

# Before: Opus for everything ($15/$75 per million tokens)
model = "claude-opus-4"

# After: Haiku for simple tasks ($0.25/$1.25 per million)
model = "claude-3-5-haiku"

# Plus: Cache already-processed tickets
if ticket_id in processed_cache:
    return cached_result

Change 1: Model selection. Simple binary classifications don't need the most powerful model. Haiku is 60x cheaper and handles yes/no decisions just as accurately.

Change 2: Caching. The monitor was re-analyzing tickets it had already processed. A simple in-memory cache eliminated 80% of API calls entirely.

Results

98%
Cost reduction
60x
Cheaper model
80%
Fewer API calls

Key Lessons

  • Not every task needs the biggest model. Simple yes/no classifications work fine with smaller models.
  • Cache before calling. If you've already done the work, don't do it again.
  • Check your logs. The fix was obvious once we saw the call patterns.

The takeaway: before optimizing prompts or architectures, check if you're using the right model for each task. Most LLM cost problems are model selection problems.

Free Tool: API Cost Calculator

Calculate your potential savings. Compare GPT-4, Opus, Sonnet, and Haiku pricing.

Try the Calculator →

Related Service

Fund Operations Automation

This cost optimization was part of a larger fund ops automation project. I help VC/PE funds automate IR monitoring, document classification, and workflow automation — saving €100K+ annually.

Learn more →

Related Posts

Multi-Tenant Slack Bot: One App, Three Workspaces

Config-driven architecture for multi-workspace Slack bots.

Building an MBOX-to-PST Converter in 30 Minutes

82 lines of Python for email format conversion at scale.

Evgeny Goncharov - Founder of TechConcepts, ex-Yandex, ex-EY, Darden MBA

Evgeny Goncharov

Founder, TechConcepts

I build automation tools and custom software for businesses. Previously at Yandex (Search) and EY (Advisory). Darden MBA. Based in Madrid.

About me LinkedIn GitHub
← All blog posts

Spending too much on AI APIs?

15 minutes. No pitch. Just honest advice on cutting your AI costs.

Book a Call