All models
3 of 40 open-source models (filtered).
32B sweet-spot model: strong reasoning, fits on one H100 in fp16, on a 4090 at Q4. The 32B size in particular hits a quality/cost knee — quality scales with parameters faster than cost up to ~32B, and slower afterwards. Favoured for production chat where 7B isn't sharp enough and where 70B+ would over-spec the hardware budget. Apache 2.0 licence.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 19.2 GB
Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 8.4 GB
Apache-2.0-licensed 7B model with surprisingly strong reasoning and multilingual chops. Qwen 2.5 trains on a larger and more carefully filtered corpus than the original Qwen series, and the 7B variant punches well above its weight on coding and math benchmarks. A strong default for cost-sensitive chat workloads and for fine-tuning experiments where the Apache licence simplifies downstream redistribution.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 4.2 GB