guides

Guides for building with LLMs

The reference explanations behind the tools — pricing mechanics, quantization, model choice and local inference. Precise, opinionated where it helps, and written to be quoted. Every guide links the tool that puts it into practice.

LLM pricing explained Input vs output vs cached vs reasoning tokens — and why the headline price per million misleads. 8 min read → Token costs in practice Four real workloads costed end to end, and the lever that moves each monthly bill. 9 min read → Choosing a model A decision framework that starts from your constraint and defaults to the cheapest model that passes. 8 min read → Quantization explained Q4 vs Q5 vs Q8 vs FP16 — what each costs in quality and VRAM, and which to actually use. 7 min read → Running models locally The hardware reality: how much VRAM you need, what fits on consumer GPUs, and how fast. 9 min read →

Start with the tools

Each guide pairs with a tool: the cost calculator and token counter for pricing, the model comparison and head-to-head pages for choosing, and the VRAM calculator and run-locally guides for local inference. For measured rather than list-price numbers, see cost-per-task and the price-history archive.