Mistral AI
Mistral
Paris-based lab whose dense and mixture-of-experts open-weights models popularised sliding-window attention and high efficiency.
Visit homepage ↗History & context
Mistral AI launched in May 2023 with a singular thesis: that European AI research could compete with Silicon Valley labs on efficiency and openness. Mistral 7B (September 2023) was the breakout — at the time the best 7B model by a wide margin, and it shipped under Apache 2.0 with no restrictions.
Mixtral 8×7B (December 2023) introduced the open community to mixture-of-experts at scale: 47B total parameters but only ~13B active per token, giving 13B-speed inference with 70B-tier quality. Mixtral 8×22B (April 2024) scaled the same recipe up.
Mistral's commercial strategy has evolved: the company now ships flagship 'Large' models under a paid licence while continuing to release smaller models (Mistral 7B, Mistral Small 3, Mistral Nemo) under Apache 2.0. The dual-licence model — open weights for community, paid hosted API for production — is increasingly common across the industry.
Flagship model
5 models in this family
Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.
- Context
- 66K
- License
- apache-2-0
- VRAM Q4
- 84.6 GB
The mixture-of-experts release that introduced 8 experts of 7B each, 2 active per token. ~13B active parameters with 47B total, which makes per-token inference roughly as fast as a 13B dense model while approaching 70B dense quality. Apache 2.0 weights mean it's still a popular self-hosting choice. Memory footprint is the main constraint — the full 47B parameters must be loaded even though only a quarter are active per token.
- Context
- 33K
- License
- apache-2-0
- VRAM Q4
- 28 GB
24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.
- Context
- 33K
- License
- apache-2-0
- VRAM Q4
- 14.4 GB
Joint Mistral × NVIDIA model with 128K context, designed as a drop-in upgrade to Mistral 7B. Trained with NVIDIA's Megatron stack and released under Apache 2.0. Strong multilingual coverage thanks to the Tekken tokenizer.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 7.2 GB
The original Mistral 7B refresh with 32K context and extended vocabulary. Permissive Apache 2.0 weights and the first widely-deployed sliding-window-attention model. Still useful in 2026 for very-low-cost inference and as a baseline for fine-tuning experiments.
- Context
- 33K
- License
- apache-2-0
- VRAM Q4
- 4.2 GB