All models

8 of 55 open-source models (filtered).

Sort:

xAI's second open-weights release, Apache 2.0. ~300B mixture-of-experts. xAI's pattern of open-sourcing the previous frontier when a new one ships continues from Grok 1. Competitive with GPT-4-class chat quality at release; today useful mainly as a research artefact given the compute needed to run it.

Context: 131K
License: apache-2-0
VRAM Q4: 180 GB

Qwen 3 235B (A22B)

235B

The flagship Qwen 3 release: a 235B-total MoE with 22B active parameters per token. Competitive with DeepSeek V3 and Llama 4 Maverick on reasoning benchmarks while being smaller total. Apache 2.0 — one of the most permissively licenced frontier-class models.

Context: 128K
License: apache-2-0
VRAM Q4: 141 GB

Mixtral 8×22B Instruct

141B

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Context: 66K
License: apache-2-0
VRAM Q4: 84.6 GB

Qwen 3 32B

32B

32B sweet-spot Qwen 3, Apache 2.0. Reasoning-mode toggle inherited from smaller siblings; strong on math, code and agentic tool use. Fits on a single H100 in fp16 and on a 4090 at Q4.

Context: 33K
License: apache-2-0
VRAM Q4: 19.2 GB

Qwen2.5 32B Instruct

32B

32B sweet-spot model: strong reasoning, fits on one H100 in fp16, on a 4090 at Q4. The 32B size in particular hits a quality/cost knee — quality scales with parameters faster than cost up to ~32B, and slower afterwards. Favoured for production chat where 7B isn't sharp enough and where 70B+ would over-spec the hardware budget. Apache 2.0 licence.

Context: 128K
License: apache-2-0
VRAM Q4: 19.2 GB

QwQ 32B Preview

32B

Qwen's reasoning-focused 'thinking' model. Generates long chains-of-thought before answering, similar to OpenAI's o1 and DeepSeek R1 lineage. Optimised for math and competition-style problem solving. The Preview tag means Qwen is iterating quickly; later versions may obsolete this one. Useful today for math-heavy workloads where a slow, careful answer is preferred to a fast wrong one.

Context: 33K
License: apache-2-0
VRAM Q4: 19.2 GB

Qwen2.5 14B Instruct

14B

Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.

Context: 128K
License: apache-2-0
VRAM Q4: 8.4 GB

Qwen 3 8B

The April 2025 refresh of Qwen at 8B. Native mixed-mode reasoning: the model can 'think' before answering when triggered, or answer directly for simple queries — configurable per request. Apache 2.0. A strong upgrade over Qwen 2.5 7B on math and code, with much better instruction following.

Context: 33K
License: apache-2-0
VRAM Q4: 4.8 GB