Run Command R 35B on RTX 4080

35B parameters on a 16 GB card. It won't fit in this card's memory at any usable quantization — here's the math, and what will run it.

verdict · Command R 35B on RTX 4080

Won't fit

even Q3_K_M needs 18.1 GB · only 15.2 GB usable

too big for this card — run it in the cloud

Smallest GPU that fits at Q4_K_M: A100 40GB

Rent A100 40GB by the hour

VRAM needed and fit for Command R 35B on RTX 4080 by quantization.
quantization	vram needed	fits 16gb?	tokens/sec	quality
FP16 (full)	84.0 GB	✗ no	—	Reference quality — no quantization loss.
Q8_0	44.5 GB	✗ no	—	Near-lossless; rarely worth the extra space over Q6.
Q6_K	34.4 GB	✗ no	—	Virtually indistinguishable from full precision.
Q5_K_M	29.0 GB	✗ no	—	Minor loss; an excellent quality-vs-size balance.
Q4_K_M	23.5 GB	✗ no	—	Small but measurable loss; the popular default.
Q3_K_M	18.1 GB	✗ no	—	Noticeable degradation; only when you're tight on VRAM.

Weights = params × bytes/weight, +20% for KV cache & runtime; usable VRAM is 95% of nameplate. Tokens/sec is a bandwidth ceiling (717 GB/s) — real throughput is lower with long context. Try other combinations →

Why Command R 35B won't fit a RTX 4080

Command R 35B has 35 billion parameters. Even quantized hard to Q3_K_M, its weights plus KV-cache overhead come to roughly 18 GB, well past the RTX 4080's 15 GB of usable VRAM. Spilling the overflow to system RAM (CPU offload) works but can cut throughput tenfold. To run it at the popular Q4_K_M default you'd want a A100 40GB or larger — or rent one by the hour rather than buy.

Quantization is the lever

Each step down in precision shrinks the model: FP16 needs about 84 GB, Q4_K_M about 24 GB — a 72% reduction for a small, usually acceptable quality cost. Q5_K_M and Q6_K are near-lossless if you have the headroom; drop to Q3 only when you're genuinely out of VRAM. The quantization guide covers the tradeoffs in detail.

Frequently asked questions

Can a RTX 4080 run Command R 35B?

Not in VRAM. Even at the smallest practical quantization (Q3_K_M), Command R 35B needs about 18 GB versus the RTX 4080's 15 GB usable. You'd need a larger GPU — a A100 40GB fits it at Q4_K_M, or to rent one by the hour.

How much VRAM does Command R 35B need?

Command R 35B is a 35B-parameter model. At FP16 that's about 84 GB; at Q4_K_M (the popular default) about 24 GB, including ~20% for the KV cache and runtime. Quantization is the main lever — see the per-quant table above.

Other combinations

Command R 35B on other GPUs: RTX 3060 12GB, RTX 4070 Ti, RTX 3090, RTX 4090, NVIDIA L4, RTX A6000

Other models on the RTX 4080: Llama 3.1 8B, Gemma 2 9B, Mistral 7B, Qwen2.5 7B, Phi-3 Medium 14B, Gemma 2 27B

VRAM calculator — any model, quant and GPU.
Running models locally — the hardware reality.
All run-locally combinations →