methodology

How we measure

Our whole value is being right about what things cost. So here is exactly where every number comes from, how we reconcile sources, and what we don't claim. Last snapshot : 298 models, 96 priced from first-party lists, 202 from the routed market.

Two sources, on purpose

No single source is both broad and authoritative. So we use two and tell you which you're looking at on every row.

How we reconcile them

For every premier model we compare the OpenRouter price against the curated first-party list price on each run, using a deterministic model-id map (no fuzzy name matching):

Routine catalog growth never floods that queue: only premier price changes, large premier divergences, and outright fetch failures are flagged for a human.

How often, and the history

Prices are refreshed on an alternate-day cadence. Each refresh appends a dated snapshot to an append-only history table — rows can be added but never edited or deleted, enforced at the database level. That archive is the point: it gets more valuable every day it runs, and it lets us show how the cost of intelligence falls over time. Because the cadence is every other day, the history has gaps by design — it is a time series of capture dates, not a guaranteed row per calendar day.

What we don't claim

Measured cost, not just list price

List prices are only half the story — a model that costs less per token can cost more to finish a task if it generates more tokens. That's why we also run a fixed task battery across the core models and publish the real cost-to-complete on the cost-per-task index. Sourced list prices tell you the rate; measured cost-per-task tells you the bill.

Related