Mistral AI

Mistral

Paris-based lab whose dense and mixture-of-experts open-weights models popularised sliding-window attention and high efficiency.

Visit homepage ↗

History & context

Mistral AI launched in May 2023 with a singular thesis: that European AI research could compete with Silicon Valley labs on efficiency and openness. Mistral 7B (September 2023) was the breakout — at the time the best 7B model by a wide margin, and it shipped under Apache 2.0 with no restrictions.

Mixtral 8×7B (December 2023) introduced the open community to mixture-of-experts at scale: 47B total parameters but only ~13B active per token, giving 13B-speed inference with 70B-tier quality. Mixtral 8×22B (April 2024) scaled the same recipe up.

Mistral's commercial strategy has evolved: the company now ships flagship 'Large' models under a paid licence while continuing to release smaller models (Mistral 7B, Mistral Small 3, Mistral Nemo) under Apache 2.0. The dual-licence model — open weights for community, paid hosted API for production — is increasingly common across the industry.

Flagship model

Mixtral 8×22B Instruct

141B

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Context: 66K
License: apache-2-0
VRAM Q4: 84.6 GB

5 models in this family

Mixtral 8×22B Instruct

141B

Context: 66K
License: apache-2-0
VRAM Q4: 84.6 GB

Mixtral 8×7B Instruct

46.7B

The mixture-of-experts release that introduced 8 experts of 7B each, 2 active per token. ~13B active parameters with 47B total, which makes per-token inference roughly as fast as a 13B dense model while approaching 70B dense quality. Apache 2.0 weights mean it's still a popular self-hosting choice. Memory footprint is the main constraint — the full 47B parameters must be loaded even though only a quarter are active per token.

Context: 33K
License: apache-2-0
VRAM Q4: 28 GB

Mistral Small 3

24B

24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.

Context: 33K
License: apache-2-0
VRAM Q4: 14.4 GB

Mistral Nemo 12B

12B

Joint Mistral × NVIDIA model with 128K context, designed as a drop-in upgrade to Mistral 7B. Trained with NVIDIA's Megatron stack and released under Apache 2.0. Strong multilingual coverage thanks to the Tekken tokenizer.

Context: 128K
License: apache-2-0
VRAM Q4: 7.2 GB

Mistral 7B v0.3

The original Mistral 7B refresh with 32K context and extended vocabulary. Permissive Apache 2.0 weights and the first widely-deployed sliding-window-attention model. Still useful in 2026 for very-low-cost inference and as a baseline for fine-tuning experiments.

Context: 33K
License: apache-2-0
VRAM Q4: 4.2 GB

Comparing Mistral against another family? Try the side-by-side comparator or browse all leaderboards.