Comparison

Llama 3.2 3B vs Gemma 2 2B

Side-by-side specs, benchmarks and hosted-inference pricing.

Side A

Meta · Llama

Pocket-sized Llama 3 variant for edge deployment. Surprising chat quality after instruction tuning makes it competitive with much larger models from a previous generation. At Q4 it fits in ~2 GB of VRAM and runs on consumer GPUs and recent Apple Silicon. A strong default for on-device chat, summarisation, and structured extraction tasks where the workload doesn't need frontier reasoning quality.

Side B

Gemma 2 2B

Google DeepMind · Gemma

Compact Gemma variant designed for on-device inference. Trained with knowledge distillation from larger Gemma 2 teachers. Runs comfortably on a phone at Q4.

Specs

Parameters	3B	2.6B
Context length	128K	8K
Modality	text	text
Released	2024-09-25	2024-07-31
License	Llama 3 Community License	Gemma Terms of Use
Commercial use	Yes	Yes
VRAM fp16	6 GB	5.2 GB
VRAM Q4	1.8 GB	1.6 GB

Benchmarks

HumanEval	51.5	17.7
IFEval	77.4	—
MATH	48.0	11.8
MMLU	63.4	51.3

Cheapest hosted pricing

Llama 3.2 3B

groq: $0.06 in / $0.06 out per 1M tokens

Gemma 2 2B

Self-host only — no listed hosted pricing.

Highlighted cells indicate the better value for that row (higher score, larger context, lower VRAM).