All models
4 of 40 open-source models (filtered).
Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.
- Context
- 66K
- License
- apache-2-0
- VRAM Q4
- 84.6 GB
Qwen's reasoning-focused 'thinking' model. Generates long chains-of-thought before answering, similar to OpenAI's o1 and DeepSeek R1 lineage. Optimised for math and competition-style problem solving. The Preview tag means Qwen is iterating quickly; later versions may obsolete this one. Useful today for math-heavy workloads where a slow, careful answer is preferred to a fast wrong one.
- Context
- 33K
- License
- apache-2-0
- VRAM Q4
- 19.2 GB
32B sweet-spot model: strong reasoning, fits on one H100 in fp16, on a 4090 at Q4. The 32B size in particular hits a quality/cost knee — quality scales with parameters faster than cost up to ~32B, and slower afterwards. Favoured for production chat where 7B isn't sharp enough and where 70B+ would over-spec the hardware budget. Apache 2.0 licence.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 19.2 GB
Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 8.4 GB