Batch API vs Real-Time API
OpenAI batch is 50% off. So is Anthropic's. But "cheaper" only matters if your workload can tolerate a 24-hour SLA. This isn't a breakeven calculator. It's a workload decision tool. Plug in your constraints, see all four strategies side by side, get a recommendation.
Workload constraints
Cost across 4 strategies
—
—
Batch API on most providers has a 24-hour SLA (no guarantees of when in that window your job runs). Don't use batch for anything user-facing or streaming. Caching/batch stacking is provider-specific: OpenAI yes, Anthropic no (as of latest verification).
When batch fits
- Overnight bulk processing (embeddings, summarizations, classifications)
- Backfill jobs running through historical data
- Periodic aggregation (daily reports, weekly digests)
- Internal tools where async is fine
When batch DOESN'T fit
- Anything user-facing that waits for a response
- Streaming UIs (chat, autocomplete)
- Time-sensitive workflows (fraud scoring, alerting)
- Pipelines where downstream steps need the output within minutes
When caching beats batch
If your workload has a long stable prefix (system, tools, retrieval context) and you send it many times, sync with caching often beats batch alone. Cache reads can be 10× cheaper than full input; batch is only 2× cheaper. With a high cache hit rate, real-time-cached can be cheaper than batch-no-cache and still serves user-facing traffic.
When stacking matters
OpenAI lets you stack batch and cache discounts. Anthropic doesn't (cache reads aren't batch-discounted further). For Claude, you usually pick one: caching for online traffic, or batch for offline jobs, not both. The calculator shows "n/a" for batch+cache where stacking isn't supported.