Comparison
Llama 3.2 3B vs Gemma 2 2B
Side-by-side specs, benchmarks and hosted-inference pricing.
Side A
Llama 3.2 3BMeta · Llama
Pocket-sized Llama 3 variant for edge deployment. Surprising chat quality after instruction tuning makes it competitive with much larger models from a previous generation. At Q4 it fits in ~2 GB of VRAM and runs on consumer GPUs and recent Apple Silicon. A strong default for on-device chat, summarisation, and structured extraction tasks where the workload doesn't need frontier reasoning quality.
Side B
Gemma 2 2BGoogle DeepMind · Gemma
Compact Gemma variant designed for on-device inference. Trained with knowledge distillation from larger Gemma 2 teachers. Runs comfortably on a phone at Q4.
Specs
| Parameters | 3B | 2.6B |
| Context length | 128K | 8K |
| Modality | text | text |
| Released | 2024-09-25 | 2024-07-31 |
| License | Llama 3 Community License | Gemma Terms of Use |
| Commercial use | Yes | Yes |
| VRAM fp16 | 6 GB | 5.2 GB |
| VRAM Q4 | 1.8 GB | 1.6 GB |
Benchmarks
| HumanEval | 51.5 | 17.7 |
| MATH | 48.0 | 11.8 |
| MMLU | 63.4 | 51.3 |
Cheapest hosted pricing
Llama 3.2 3B
groq: $0.06 in / $0.06 out per 1M tokens
Gemma 2 2B
Self-host only — no listed hosted pricing.