Qwen2.5 14B Instruct
Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.
- Parameters
- 14B
- Context length
- 128K
- Modality
- text
- Released
- 2024-09-18
Memory & hardware
- VRAM (fp16)
- 28 GB
- VRAM (Q4)
- 8.4 GB
- Recommended
- A100 40GB or RTX 4090 (Q4)
- Quantizations
- fp16, q8_0, q5_k_m, q4_k_m
Benchmarks
Hosted inference pricing
USD per million tokens.
| Provider | Input | Output | |
|---|---|---|---|
| togetherCheapest | $0.30 | $0.30 |
Run it yourself
Drop-in commands for the three most common open-source inference paths. The Ollama tag is a best-effort match against the registry; verify the size variant before pulling.
ollama run qwen2.5:14b
vllm serve Qwen/Qwen2.5-14B-Instruct
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B-Instruct")
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-14B-Instruct", device_map="auto", torch_dtype="auto"
)Qwen/Qwen2.5-14B-Instruct Related models
Same family or similar size — useful when shopping around.
Phi-3's mid-tier model with extended 128K context. MIT licence. Strong reasoning relative to its parameter count thanks to Microsoft's heavy investment in synthetic training data.
- Context
- 128K
- License
- mit
- VRAM Q4
- 8.4 GB
14B model trained primarily on synthetic data. Punches above its weight on reasoning, especially MATH and GPQA. MIT licensed. A standout choice when you want strong reasoning quality without paying 70B-tier hardware costs. Phi-4 in particular demonstrated that careful synthetic-data curation can extract frontier-class reasoning from a relatively small dense model.
- Context
- 16K
- License
- mit
- VRAM Q4
- 8.4 GB
Larger OLMo 2 release. Same fully-open philosophy as the 7B variant. The 13B size makes it more competitive with mainstream production-grade chat models.
- Context
- 4K
- License
- apache-2-0
- VRAM Q4
- 7.8 GB
Apache-2.0-licensed 7B model with surprisingly strong reasoning and multilingual chops. Qwen 2.5 trains on a larger and more carefully filtered corpus than the original Qwen series, and the 7B variant punches well above its weight on coding and math benchmarks. A strong default for cost-sensitive chat workloads and for fine-tuning experiments where the Apache licence simplifies downstream redistribution.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 4.2 GB
24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.
- Context
- 33K
- License
- apache-2-0
- VRAM Q4
- 14.4 GB
Flagship Gemma 2 release. Uses logit-distillation from a larger teacher model, which is how Google delivers near-70B quality from a 27B student. A solid choice when the Llama community licence doesn't fit and you need quality at the 27B–40B size range.
- Context
- 8K
- License
- gemma
- VRAM Q4
- 16.2 GB