run it locally

Run Gemma 2 27B on RTX 3060 12GB

27.2B parameters on a 12 GB card. It won't fit in this card's memory at any usable quantization — here's the math, and what will run it.

verdict · Gemma 2 27B on RTX 3060 12GB
Won't fit
even Q3_K_M needs 14.0 GB · only 11.4 GB usable
too big for this card — run it in the cloud
Smallest GPU that fits at Q4_K_M: RTX 3090
Rent RTX 3090 by the hour
VRAM needed and fit for Gemma 2 27B on RTX 3060 12GB by quantization.
quantization vram needed fits 12gb? tokens/sec quality
FP16 (full) 65.3 GB ✗ no Reference quality — no quantization loss.
Q8_0 34.6 GB ✗ no Near-lossless; rarely worth the extra space over Q6.
Q6_K 26.8 GB ✗ no Virtually indistinguishable from full precision.
Q5_K_M 22.5 GB ✗ no Minor loss; an excellent quality-vs-size balance.
Q4_K_M 18.3 GB ✗ no Small but measurable loss; the popular default.
Q3_K_M 14.0 GB ✗ no Noticeable degradation; only when you're tight on VRAM.

Weights = params × bytes/weight, +20% for KV cache & runtime; usable VRAM is 95% of nameplate. Tokens/sec is a bandwidth ceiling (360 GB/s) — real throughput is lower with long context. Try other combinations →

Why Gemma 2 27B won't fit a RTX 3060 12GB

Gemma 2 27B has 27.2 billion parameters. Even quantized hard to Q3_K_M, its weights plus KV-cache overhead come to roughly 14 GB, well past the RTX 3060 12GB's 11 GB of usable VRAM. Spilling the overflow to system RAM (CPU offload) works but can cut throughput tenfold. To run it at the popular Q4_K_M default you'd want a RTX 3090 or larger — or rent one by the hour rather than buy.

Quantization is the lever

Each step down in precision shrinks the model: FP16 needs about 65 GB, Q4_K_M about 18 GB — a 72% reduction for a small, usually acceptable quality cost. Q5_K_M and Q6_K are near-lossless if you have the headroom; drop to Q3 only when you're genuinely out of VRAM. The quantization guide covers the tradeoffs in detail.

Frequently asked questions

Can a RTX 3060 12GB run Gemma 2 27B?

Not in VRAM. Even at the smallest practical quantization (Q3_K_M), Gemma 2 27B needs about 14 GB versus the RTX 3060 12GB's 11 GB usable. You'd need a larger GPU — a RTX 3090 fits it at Q4_K_M, or to rent one by the hour.

How much VRAM does Gemma 2 27B need?

Gemma 2 27B is a 27.2B-parameter model. At FP16 that's about 65 GB; at Q4_K_M (the popular default) about 18 GB, including ~20% for the KV cache and runtime. Quantization is the main lever — see the per-quant table above.

Other combinations

Gemma 2 27B on other GPUs: RTX 4070 Ti, RTX 4080, RTX 3090, RTX 4090, NVIDIA L4, RTX A6000

Other models on the RTX 3060 12GB: Llama 3.1 8B, Gemma 2 9B, Mistral 7B, Qwen2.5 7B, Phi-3 Medium 14B, Qwen2.5 32B

Related