OSAIM
Open Source AI Models

Comparison

Mixtral 8×22B Instruct vs Qwen2.5 72B Instruct

Side-by-side specs, benchmarks and hosted-inference pricing.

Side A
Mixtral 8×22B Instruct
Mistral AI · Mistral

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Side B
Qwen2.5 72B Instruct
Alibaba · Qwen

The flagship Qwen 2.5 release. Competes with Llama 3.1 405B on many benchmarks at one-fifth the parameter count. Note the 72B specifically uses the Qwen License (commercial use up to 100M MAU) — the smaller Qwen2.5 sizes are Apache 2.0.

Specs

Parameters141B72B
Context length66K128K
Modalitytexttext
Released2024-04-172024-09-18
LicenseApache 2.0Qwen License
Commercial useYesYes
VRAM fp16282 GB144 GB
VRAM Q484.6 GB43.2 GB

Benchmarks

HumanEval76.086.6
MATH41.883.1
MMLU77.886.1

Cheapest hosted pricing

Mixtral 8×22B Instruct
together: $1.20 in / $1.20 out per 1M tokens
Qwen2.5 72B Instruct
together: $1.20 in / $1.20 out per 1M tokens
Highlighted cells indicate the better value for that row (higher score, larger context, lower VRAM).