Comparison

Mixtral 8×22B Instruct vs Qwen2.5 72B Instruct

Side-by-side specs, benchmarks and hosted-inference pricing.

Side A

Mistral AI · Mistral

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Side B

Qwen2.5 72B Instruct

Alibaba · Qwen

The flagship Qwen 2.5 release. Competes with Llama 3.1 405B on many benchmarks at one-fifth the parameter count. Note the 72B specifically uses the Qwen License (commercial use up to 100M MAU) — the smaller Qwen2.5 sizes are Apache 2.0.

Specs

Parameters	141B	72B
Context length	66K	128K
Modality	text	text
Released	2024-04-17	2024-09-18
License	Apache 2.0	Qwen License
Commercial use	Yes	Yes
VRAM fp16	282 GB	144 GB
VRAM Q4	84.6 GB	43.2 GB

Benchmarks

ArenaHard	—	81.2
HumanEval	76.0	86.6
IFEval	—	84.1
MATH	41.8	83.1
MMLU	77.8	86.1

Cheapest hosted pricing

Mixtral 8×22B Instruct

together: $1.20 in / $1.20 out per 1M tokens

Qwen2.5 72B Instruct

together: $1.20 in / $1.20 out per 1M tokens

Highlighted cells indicate the better value for that row (higher score, larger context, lower VRAM).