OSAIM
Open Source AI Models

Comparison

Llama 3.2 3B vs Gemma 2 2B

Side-by-side specs, benchmarks and hosted-inference pricing.

Side A
Llama 3.2 3B
Meta · Llama

Pocket-sized Llama 3 variant for edge deployment. Surprising chat quality after instruction tuning makes it competitive with much larger models from a previous generation. At Q4 it fits in ~2 GB of VRAM and runs on consumer GPUs and recent Apple Silicon. A strong default for on-device chat, summarisation, and structured extraction tasks where the workload doesn't need frontier reasoning quality.

Side B
Gemma 2 2B
Google DeepMind · Gemma

Compact Gemma variant designed for on-device inference. Trained with knowledge distillation from larger Gemma 2 teachers. Runs comfortably on a phone at Q4.

Specs

Parameters3B2.6B
Context length128K8K
Modalitytexttext
Released2024-09-252024-07-31
LicenseLlama 3 Community LicenseGemma Terms of Use
Commercial useYesYes
VRAM fp166 GB5.2 GB
VRAM Q41.8 GB1.6 GB

Benchmarks

HumanEval51.517.7
MATH48.011.8
MMLU63.451.3

Cheapest hosted pricing

Llama 3.2 3B
groq: $0.06 in / $0.06 out per 1M tokens
Gemma 2 2B
Self-host only — no listed hosted pricing.
Highlighted cells indicate the better value for that row (higher score, larger context, lower VRAM).