AI API pricing feels opaque until you understand the underlying economics. Once you do, the batch discount makes complete sense — and knowing how it works helps you plan jobs and estimate costs before you run them.
Why batch processing costs less
When you call an AI model in real-time — through a chat interface or a standard API — the infrastructure needs to respond immediately. The compute capacity is reserved, the response begins in milliseconds, and you pay a premium for that immediacy.
Batch processing works differently. You submit a job and it runs in background compute windows — the spare capacity that would otherwise sit idle between peak demand periods. Google (and other cloud AI providers) can run these jobs efficiently and pass on the cost saving as a lower per-token price.
The trade-off is time. A batch job doesn't respond in seconds — it might take minutes or hours depending on dataset size and current demand. For most production use cases (preparing a dataset to import, processing a monthly export, running a campaign build), that delay is completely acceptable.
How PromptMax prices jobs
PromptMax charges for the tokens your job uses — tokens being the units of text the model reads and writes. Input tokens are the text you send (your batch instructions + the row data). Output tokens are the responses the model generates.
All three providers PromptMax supports — Google, Anthropic, and OpenAI — offer a 50% discount on batch processing vs their standard API rates. PromptMax charges a service fee on top of the batch rate, so you pay around 25% less than standard API rates. You get the cost benefit of batch infrastructure without needing your own API keys or pipeline.
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Best for |
|---|---|---|---|---|
| Gemini 2.5 Flash Lite | £0.08 | £0.32 | Simple classifications, short structured outputs | |
| GPT 5.4 nano | OpenAI | £0.08 | £0.49 | High volume, GPT-style output |
| Gemini 2.5 Flash | £0.24 | £1.97 | Most tasks — fast, capable, cost-effective | |
| GPT 5.4 mini | OpenAI | £0.30 | £1.78 | Mid-range GPT tasks |
| Claude Haiku 4.5 | Anthropic | £0.40 | £1.98 | High volume writing tasks |
| Gemini 2.5 Pro | £0.99 | £3.95 | Complex content, long-form output | |
| Gemini 3.1 Pro | £0.79 | £4.74 | Latest Gemini, complex reasoning | |
| Claude Sonnet 4.6 | Anthropic | £1.19 | £5.93 | Nuanced writing, tone-sensitive tasks |
| GPT 5.5 | OpenAI | £1.98 | £11.85 | Highest-capability GPT tasks |
| Claude Opus 4.7 | Anthropic | £1.98 | £9.88 | Highest-stakes writing and analysis |
Prices converted at approximate GBP rates. A £0.10 minimum charge applies per job.
What a typical job costs
The cost of a batch job depends on three things: how many rows, how much data per row, and how long the output is. Here are some real-world examples at different scales:
500 product descriptions (Gemini 2.5 Flash)
Input: ~200 tokens per row (instructions + product data) = 100,000 input tokens
Output: ~250 tokens per row (180-word description) = 125,000 output tokens
Estimated cost: ~£0.27
2,400 SEO meta tags (Gemini 2.5 Flash)
Input: ~120 tokens per row = 288,000 input tokens
Output: ~40 tokens per row (title + meta) = 96,000 output tokens
Estimated cost: ~£0.26
5,000 survey classifications (Gemini 2.5 Pro)
Input: ~180 tokens per row = 900,000 input tokens
Output: ~30 tokens per row (category + sentiment + summary) = 150,000 output tokens
Estimated cost: ~£1.49
These are estimates — actual costs depend on your specific data. PromptMax shows a cost estimate before you submit each job based on your CSV and model selection, so you know what to expect before committing.
Choosing the right model for your budget
The model choice has a much larger effect on cost than the number of rows. Gemini 2.5 Pro costs roughly 6–12× more than Flash for the same job. Claude Sonnet costs more than Haiku but produces noticeably better prose quality. GPT 5.5 is the top tier for OpenAI output but 5.4 mini handles most tasks well at a fraction of the price.
For most tasks, Gemini Flash is the best starting point — it's fast, capable, and produces clean output for structured or format-constrained jobs. Use a Pro-tier model when the task requires genuine depth: complex content, nuanced tone, or reasoning across long inputs. Switch to Claude when writing quality and natural prose matter more than cost. Try GPT when output consistency with ChatGPT is important.
A practical approach: run your test batch of 20–30 rows on a cheaper model first and compare the output quality honestly. If it's good enough, use it for the full run. If a more capable model produces noticeably better results for your specific task, the absolute cost difference is usually still small — a few pounds for most jobs.
The honest trade-off
The lower cost comes with one real limitation: batch jobs aren't instant. For large datasets, a job might take an hour or two to complete. PromptMax notifies you when results are ready — you don't need to wait at your computer.
For anyone planning production use cases — building a campaign, refreshing a product catalogue, processing a monthly data export — this is a non-issue. The cost saving and elimination of manual work far outweigh the wait.