comparisons
Compare LLMs head-to-head
156 side-by-side comparisons across 24 leading models — token pricing, context window, modality and the real monthly cost for a typical workload. Pick a matchup, or start from a model.
Models
Flagship matchups
Frontier models — quality-first, output ≥ $15/1M.
- GPT-5.5 vs GPT-5.4
- GPT-5.5 vs GPT-5.4 mini
- GPT-5.5 vs GPT-5.4 nano
- GPT-5.5 vs gpt-oss-120b
- Claude Fable 5 vs GPT-5.5
- GPT-5.5 vs Claude Opus 4.8
- GPT-5.5 vs Claude Sonnet 4.6
- GPT-5.5 vs Gemini 3.1 Pro
- GPT-5.5 vs Gemini 2.5 Pro
- GPT-5.5 vs Mistral Medium 3.5
- GPT-5.4 vs GPT-5.4 mini
- GPT-5.4 vs GPT-5.4 nano
- GPT-5.4 vs gpt-oss-120b
- Claude Fable 5 vs GPT-5.4
- Claude Opus 4.8 vs GPT-5.4
- GPT-5.4 vs Claude Sonnet 4.6
- GPT-5.4 vs Claude Haiku 4.5
- GPT-5.4 vs Gemini 3.1 Pro
- GPT-5.4 vs Gemini 2.5 Pro
- GPT-5.4 vs Mistral Medium 3.5
- GPT-5.4 vs Qwen3 Max
- Claude Sonnet 4.6 vs GPT-5.4 mini
- Claude Fable 5 vs Claude Opus 4.8
- Claude Fable 5 vs Claude Sonnet 4.6
- Claude Fable 5 vs Claude Haiku 4.5
- Claude Fable 5 vs Gemini 3.1 Pro
- Claude Fable 5 vs Gemini 2.5 Pro
- Claude Opus 4.8 vs Claude Sonnet 4.6
- Claude Opus 4.8 vs Claude Haiku 4.5
- Claude Opus 4.8 vs Gemini 3.1 Pro
- Claude Opus 4.8 vs Gemini 2.5 Pro
- Claude Opus 4.8 vs Mistral Medium 3.5
- Claude Sonnet 4.6 vs Claude Haiku 4.5
- Claude Sonnet 4.6 vs Gemini 3.1 Pro
- Claude Sonnet 4.6 vs Gemini 2.5 Pro
- Claude Sonnet 4.6 vs Mistral Medium 3.5
- Claude Sonnet 4.6 vs Qwen3 Max
Mid-range matchups
The workhorse tier — output $2–15/1M.
- GPT-5.4 mini vs GPT-5.4 nano
- GPT-5.4 mini vs gpt-oss-120b
- Claude Haiku 4.5 vs GPT-5.4 mini
- Gemini 3.1 Pro vs GPT-5.4 mini
- Gemini 2.5 Pro vs GPT-5.4 mini
- GPT-5.4 mini vs Gemini 2.5 Flash
- GPT-5.4 mini vs Grok 4.3
- GPT-5.4 mini vs Grok 4.20
- Mistral Medium 3.5 vs GPT-5.4 mini
- GPT-5.4 mini vs Mistral Large 3
- GPT-5.4 mini vs Qwen3 Max
- GPT-5.4 mini vs GLM 4.6
- GPT-5.4 mini vs Kimi K2 Thinking
- GPT-5.4 mini vs MiniMax M2.5
- Claude Haiku 4.5 vs GPT-5.4 nano
- Gemini 2.5 Flash vs GPT-5.4 nano
- Grok 4.3 vs GPT-5.4 nano
- Grok 4.20 vs GPT-5.4 nano
- Qwen3 Max vs GPT-5.4 nano
- Kimi K2 Thinking vs GPT-5.4 nano
- Gemini 3.1 Pro vs Claude Haiku 4.5
- Gemini 2.5 Pro vs Claude Haiku 4.5
- Claude Haiku 4.5 vs Gemini 2.5 Flash
- Claude Haiku 4.5 vs Grok 4.3
- Claude Haiku 4.5 vs Grok 4.20
- Mistral Medium 3.5 vs Claude Haiku 4.5
- Claude Haiku 4.5 vs Mistral Large 3
- Claude Haiku 4.5 vs Qwen3 Max
- Claude Haiku 4.5 vs GLM 4.6
- Claude Haiku 4.5 vs Kimi K2 Thinking
- Gemini 3.1 Pro vs Gemini 2.5 Pro
- Gemini 3.1 Pro vs Gemini 2.5 Flash
- Gemini 3.1 Pro vs Gemini 2.5 Flash-Lite
- Gemini 3.1 Pro vs Grok 4.3
- Gemini 3.1 Pro vs Grok 4.20
- Gemini 3.1 Pro vs Mistral Medium 3.5
- Gemini 3.1 Pro vs Qwen3 Max
- Gemini 3.1 Pro vs Kimi K2 Thinking
- Gemini 2.5 Pro vs Gemini 2.5 Flash
- Gemini 2.5 Pro vs Gemini 2.5 Flash-Lite
- Gemini 2.5 Pro vs Grok 4.3
- Gemini 2.5 Pro vs Grok 4.20
- Gemini 2.5 Pro vs Mistral Medium 3.5
- Gemini 2.5 Pro vs Qwen3 Max
- Gemini 2.5 Pro vs Kimi K2 Thinking
- Gemini 2.5 Flash vs Gemini 2.5 Flash-Lite
- Gemini 2.5 Flash vs Grok 4.3
- Gemini 2.5 Flash vs Grok 4.20
- Gemini 2.5 Flash vs DeepSeek V4 Pro
- Mistral Medium 3.5 vs Gemini 2.5 Flash
- Gemini 2.5 Flash vs Mistral Large 3
- Gemini 2.5 Flash vs Llama 4 Maverick
- Qwen3 Max vs Gemini 2.5 Flash
- Gemini 2.5 Flash vs GLM 4.6
- Gemini 2.5 Flash vs Kimi K2 Thinking
- Gemini 2.5 Flash vs MiniMax M2.5
- Grok 4.3 vs Grok 4.20
- Grok 4.3 vs DeepSeek V4 Pro
- Mistral Medium 3.5 vs Grok 4.3
- Grok 4.3 vs Mistral Large 3
- Grok 4.3 vs Llama 4 Maverick
- Qwen3 Max vs Grok 4.3
- Grok 4.3 vs GLM 4.6
- Grok 4.3 vs Kimi K2 Thinking
- Grok 4.3 vs MiniMax M2.5
- Grok 4.20 vs DeepSeek V4 Pro
- Mistral Medium 3.5 vs Grok 4.20
- Grok 4.20 vs Mistral Large 3
- Grok 4.20 vs Llama 4 Maverick
- Qwen3 Max vs Grok 4.20
- Grok 4.20 vs GLM 4.6
- Grok 4.20 vs Kimi K2 Thinking
- Grok 4.20 vs MiniMax M2.5
- Qwen3 Max vs DeepSeek V4 Pro
- Kimi K2 Thinking vs DeepSeek V4 Pro
- Mistral Medium 3.5 vs Mistral Large 3
- Mistral Medium 3.5 vs Qwen3 Max
- Mistral Medium 3.5 vs GLM 4.6
- Mistral Medium 3.5 vs Kimi K2 Thinking
- Qwen3 Max vs Mistral Large 3
- Kimi K2 Thinking vs Mistral Large 3
- Kimi K2 Thinking vs Llama 4 Maverick
- Qwen3 Max vs GLM 4.6
- Qwen3 Max vs Kimi K2 Thinking
- Qwen3 Max vs MiniMax M2.5
- Kimi K2 Thinking vs GLM 4.6
- Kimi K2 Thinking vs MiniMax M2.5
Budget matchups
High-volume, cost-sensitive — output under $2/1M.
- GPT-5.4 nano vs gpt-oss-120b
- GPT-5.4 nano vs Gemini 2.5 Flash-Lite
- GPT-5.4 nano vs DeepSeek V4 Pro
- GPT-5.4 nano vs DeepSeek V4 Flash
- Mistral Large 3 vs GPT-5.4 nano
- GPT-5.4 nano vs Llama 4 Maverick
- GLM 4.6 vs GPT-5.4 nano
- GPT-5.4 nano vs MiniMax M2.5
- Gemini 2.5 Flash-Lite vs gpt-oss-120b
- DeepSeek V4 Pro vs gpt-oss-120b
- DeepSeek V4 Flash vs gpt-oss-120b
- Llama 4 Maverick vs gpt-oss-120b
- MiniMax M2.5 vs gpt-oss-120b
- DeepSeek V4 Pro vs Gemini 2.5 Flash-Lite
- Gemini 2.5 Flash-Lite vs DeepSeek V4 Flash
- Mistral Large 3 vs Gemini 2.5 Flash-Lite
- Llama 4 Maverick vs Gemini 2.5 Flash-Lite
- GLM 4.6 vs Gemini 2.5 Flash-Lite
- MiniMax M2.5 vs Gemini 2.5 Flash-Lite
- DeepSeek V4 Pro vs DeepSeek V4 Flash
- Mistral Large 3 vs DeepSeek V4 Pro
- DeepSeek V4 Pro vs Llama 4 Maverick
- GLM 4.6 vs DeepSeek V4 Pro
- MiniMax M2.5 vs DeepSeek V4 Pro
- Llama 4 Maverick vs DeepSeek V4 Flash
- MiniMax M2.5 vs DeepSeek V4 Flash
- Mistral Large 3 vs Llama 4 Maverick
- GLM 4.6 vs Mistral Large 3
- Mistral Large 3 vs MiniMax M2.5
- GLM 4.6 vs Llama 4 Maverick
- MiniMax M2.5 vs Llama 4 Maverick
- GLM 4.6 vs MiniMax M2.5
How these comparisons work
Every comparison is generated from the same sourced catalog the rest of the site uses — no hand-picked numbers. Each page shows input and output token prices, the context window, accepted modalities, the price source, and the modelled monthly cost at a typical workload, with the cheaper option marked. Prices are list or routed market figures captured with a date; the ranking can shift with your own token mix, which is why every page links to the cost calculator.
Related
- Model comparison table — all tracked models, sortable and filterable.
- Real cost-per-task — measured cost to finish a job, not just list price.
- Price history — how these prices move over time.