guides
Guides for building with LLMs
The reference explanations behind the tools — pricing mechanics, quantization, model choice and local inference. Precise, opinionated where it helps, and written to be quoted. Every guide links the tool that puts it into practice.
LLM pricing explained Input vs output vs cached vs reasoning tokens — and why the headline price per million misleads. Token costs in practice Four real workloads costed end to end, and the lever that moves each monthly bill. Choosing a model A decision framework that starts from your constraint and defaults to the cheapest model that passes. Quantization explained Q4 vs Q5 vs Q8 vs FP16 — what each costs in quality and VRAM, and which to actually use. Running models locally The hardware reality: how much VRAM you need, what fits on consumer GPUs, and how fast.
Start with the tools
Each guide pairs with a tool: the cost calculator and token counter for pricing, the model comparison and head-to-head pages for choosing, and the VRAM calculator and run-locally guides for local inference. For measured rather than list-price numbers, see cost-per-task and the price-history archive.