OSAIM
Open Source AI Models

Comparison

Gemma 2 27B vs Qwen2.5 32B Instruct

Side-by-side specs, benchmarks and hosted-inference pricing.

Side A
Gemma 2 27B
Google DeepMind · Gemma

Flagship Gemma 2 release. Uses logit-distillation from a larger teacher model, which is how Google delivers near-70B quality from a 27B student. A solid choice when the Llama community licence doesn't fit and you need quality at the 27B–40B size range.

Side B
Qwen2.5 32B Instruct
Alibaba · Qwen

32B sweet-spot model: strong reasoning, fits on one H100 in fp16, on a 4090 at Q4. The 32B size in particular hits a quality/cost knee — quality scales with parameters faster than cost up to ~32B, and slower afterwards. Favoured for production chat where 7B isn't sharp enough and where 70B+ would over-spec the hardware budget. Apache 2.0 licence.

Specs

Parameters27B32B
Context length8K128K
Modalitytexttext
Released2024-06-272024-09-18
LicenseGemma Terms of UseApache 2.0
Commercial useYesYes
VRAM fp1654 GB64 GB
VRAM Q416.2 GB19.2 GB

Benchmarks

HumanEval51.888.4
MATH42.383.1
MMLU75.283.3

Cheapest hosted pricing

Gemma 2 27B
together: $0.30 in / $0.30 out per 1M tokens
Qwen2.5 32B Instruct
deepinfra: $0.25 in / $0.40 out per 1M tokens
Highlighted cells indicate the better value for that row (higher score, larger context, lower VRAM).