Size
Under 3B
2 open-source models in this size bucket.
Gemma 2 2B
2.6B
Compact Gemma variant designed for on-device inference. Trained with knowledge distillation from larger Gemma 2 teachers. Runs comfortably on a phone at Q4.
- Context
- 8K
- License
- gemma
- VRAM Q4
- 1.6 GB
Llama 3.2 1B
1B
The smallest Llama 3 release, designed for on-device inference on phones and laptops. The 1B model runs comfortably in <2 GB of RAM at Q4 quantization and is fast enough for real-time chat on a modern smartphone. Useful for edge inference, on-device assistants where round-tripping to a server is undesirable, and as a draft model for speculative decoding in front of a larger Llama 3 variant.
- Context
- 128K
- License
- llama-3
- VRAM Q4
- 0.6 GB