cost per task

What it actually costs to finish the job

List prices lie by omission. We run a fixed battery of real tasks across the core models, count every token actually consumed — input, output, and hidden reasoning — and publish the true cost to complete each one. The cheapest headline rate is rarely the cheapest finished job.

last measured run

6 models · 4 tasks · 2026-07-26

total measured spend, this run

$0.0591

cheaper to finish pricier to finish cost = real tokens × current price

Summarize

6 models

Condense a passage to three bullet points.

model	latency*	cost to finish
Gemini 2.5 Flash 66 in · 43 out	668ms	$0.00013
GPT-5.4 mini 77 in · 65 out	1.1s	$0.00035
Claude Haiku 4.5 79 in · 58 out	1.2s	$0.00037
GPT-5.5 77 in · 74 out	2.3s	$0.0026
Claude Opus 4.8 112 in · 89 out	2.4s	$0.0028
Gemini 3.1 Pro 66 in · 296 out	3.9s	$0.0037

Extract

6 models

Pull structured JSON (name, email, company) from text.

model	latency*	cost to finish
Gemini 2.5 Flash 38 in · 40 out	568ms	$0.00011
GPT-5.4 mini 45 in · 25 out	477ms	$0.00015
Claude Haiku 4.5 47 in · 46 out	859ms	$0.00028
Claude Opus 4.8 67 in · 48 out	1.4s	$0.0015
GPT-5.5 45 in · 48 out	3.6s	$0.0017
Gemini 3.1 Pro 38 in · 196 out	2.8s	$0.0024

Code

6 models

Write an iterative Fibonacci function.

model	latency*	cost to finish
GPT-5.4 mini 30 in · 77 out	1.2s	$0.00037
Gemini 2.5 Flash 19 in · 170 out	995ms	$0.00043
Claude Haiku 4.5 28 in · 200 out	1.5s	$0.0010
GPT-5.5 30 in · 75 out	1.4s	$0.0024
Claude Opus 4.8 40 in · 119 out	1.9s	$0.0032
Gemini 3.1 Pro 19 in · 496 out	5.1s	$0.0060

Reason

6 models

Solve a multi-step word problem step by step.

model	latency*	cost to finish
GPT-5.4 mini 63 in · 142 out	1.7s	$0.00069
Gemini 2.5 Flash 57 in · 282 out	1.5s	$0.00072
Claude Haiku 4.5 64 in · 421 out	3.5s	$0.0022
GPT-5.5 63 in · 131 out	3.0s	$0.0042
Gemini 3.1 Pro 57 in · 796 out	6.6s	$0.0097
Claude Opus 4.8 75 in · 471 out	5.6s	$0.0121

Cost is the headline: real token usage × the model's current price. *Latency is a snapshot from the last run, not a live or averaged benchmark. JSON: /cost-per-task. How we measure →

Why measured, not list price

Two models at the same headline rate can cost wildly different amounts to finish the same task, because they don't generate the same number of tokens. A terse model answers in 120 tokens; a reasoning model thinks for 800 before it starts. List pricing hides that entirely. The only honest way to compare is to run the task and count what was actually consumed — which is what this page does.

How it works

A fixed battery — summarize, extract, code, reason — runs across a small core set of models on a schedule. For each run we record the real input, output, and reasoning tokens, then compute cost from the model's current price in our catalog. No estimates, no token guesses. The battery and model set start small and grow; every figure is a real measurement with a date. See the methodology for the full approach.

Frequently asked questions

How is this different from the cost calculator?

The calculator multiplies list prices by token counts you guess. This page uses the tokens models actually consume — we run the task and count every input, output, and hidden reasoning token. A model with a cheap headline rate that thinks for 800 tokens before answering shows its true cost here.

Which models and tasks are measured?

A small core set across a fixed battery — summarize, extract structured data, write code, and reason through a problem — run on a schedule. The set is deliberately small to start and expands as traffic justifies; the measurement is real every time, never a list-price estimate.

Is the latency figure live?

No. Latency is a snapshot recorded during the last run — a rough indicator of responsiveness on that run, not a live or averaged benchmark. Network conditions, routing and load all move it. Cost is the figure to trust here; latency is context.

API cost calculator — list-price estimate for your workload.
Price history — how prices move over time.
Model comparison — current prices and sources.

What it actually costs to finish the job

Summarize

Extract

Code

Reason

Why measured, not list price

How it works

Frequently asked questions

How is this different from the cost calculator?

Which models and tasks are measured?

Is the latency figure live?

Related