Comparison
Mixtral 8×22B Instruct vs Qwen2.5 72B Instruct
Side-by-side specs, benchmarks and hosted-inference pricing.
Side A
Mixtral 8×22B InstructMistral AI · Mistral
Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.
Side B
Qwen2.5 72B InstructAlibaba · Qwen
The flagship Qwen 2.5 release. Competes with Llama 3.1 405B on many benchmarks at one-fifth the parameter count. Note the 72B specifically uses the Qwen License (commercial use up to 100M MAU) — the smaller Qwen2.5 sizes are Apache 2.0.
Specs
| Parameters | 141B | 72B |
| Context length | 66K | 128K |
| Modality | text | text |
| Released | 2024-04-17 | 2024-09-18 |
| License | Apache 2.0 | Qwen License |
| Commercial use | Yes | Yes |
| VRAM fp16 | 282 GB | 144 GB |
| VRAM Q4 | 84.6 GB | 43.2 GB |
Benchmarks
| HumanEval | 76.0 | 86.6 |
| MATH | 41.8 | 83.1 |
| MMLU | 77.8 | 86.1 |
Cheapest hosted pricing
Mixtral 8×22B Instruct
together: $1.20 in / $1.20 out per 1M tokens
Qwen2.5 72B Instruct
together: $1.20 in / $1.20 out per 1M tokens