All models

4 of 55 open-source models (filtered).

Sort:

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Context: 66K
License: apache-2-0
VRAM Q4: 84.6 GB

Mixtral 8×7B Instruct

46.7B

The mixture-of-experts release that introduced 8 experts of 7B each, 2 active per token. ~13B active parameters with 47B total, which makes per-token inference roughly as fast as a 13B dense model while approaching 70B dense quality. Apache 2.0 weights mean it's still a popular self-hosting choice. Memory footprint is the main constraint — the full 47B parameters must be loaded even though only a quarter are active per token.

Context: 33K
License: apache-2-0
VRAM Q4: 28 GB

Mistral Small 3

24B

24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.

Context: 33K
License: apache-2-0
VRAM Q4: 14.4 GB

Mistral Nemo 12B

12B

Joint Mistral × NVIDIA model with 128K context, designed as a drop-in upgrade to Mistral 7B. Trained with NVIDIA's Megatron stack and released under Apache 2.0. Strong multilingual coverage thanks to the Tekken tokenizer.

Context: 128K
License: apache-2-0
VRAM Q4: 7.2 GB