Google DeepMind
Gemma
Google's open-weights model family derived from the same research as Gemini. Strong performance at small scales.
Visit homepage ↗History & context
Google DeepMind's Gemma family is open-weights derived from the same research lineage as Gemini. Gemma 1 (February 2024) was the introduction; Gemma 2 (June 2024) delivered the major quality jump — particularly at 9B and 27B, where logit-distillation from a larger teacher closed much of the gap with closed frontiers.
The Gemma Terms of Use permit commercial use subject to Google's Prohibited Use Policy. The licence is source-available rather than OSI open source: there are field-of-use restrictions, but no monthly-active-user cap, which makes Gemma the preferred choice for some commercial users who can't meet Llama's terms.
Gemma 2 2B is one of the strongest on-device models in its size class, particularly when distilled-from-larger-teacher quality matters.
Flagship model
3 models in this family
Flagship Gemma 2 release. Uses logit-distillation from a larger teacher model, which is how Google delivers near-70B quality from a 27B student. A solid choice when the Llama community licence doesn't fit and you need quality at the 27B–40B size range.
- Context
- 8K
- License
- gemma
- VRAM Q4
- 16.2 GB
Mid-tier Gemma. Strong general-purpose chat model at small scale. The Gemma Terms of Use permit commercial use subject to Google's prohibited-use policy.
- Context
- 8K
- License
- gemma
- VRAM Q4
- 5.4 GB
Compact Gemma variant designed for on-device inference. Trained with knowledge distillation from larger Gemma 2 teachers. Runs comfortably on a phone at Q4.
- Context
- 8K
- License
- gemma
- VRAM Q4
- 1.6 GB