Microsoft
Phi
Microsoft's small-language-model line, focused on data-quality-driven training. Phi-4 punches well above its weight class.
Visit homepage ↗History & context
Microsoft's Phi line is built on a thesis: that careful curation of training data can extract frontier-class quality from much smaller models. Phi-1 (June 2023) was a 1.3B Python coder; Phi-2 (December 2023) was a 2.7B generalist; Phi-3 (April 2024) shipped at 3.8B (Mini), 14B (Medium) sizes.
Phi-4 (December 2024) was the standout: a 14B model trained primarily on synthetic data, with reasoning and math performance that exceeds models several times its size. It demonstrated that synthetic-data curation at scale could close the gap between small-data and large-data training regimes.
All Phi releases (3 and 4) ship under MIT — fully permissive open source. They're the most permissively-licensed strong-quality small models available.
Flagship model
3 models in this family
Phi-3's mid-tier model with extended 128K context. MIT licence. Strong reasoning relative to its parameter count thanks to Microsoft's heavy investment in synthetic training data.
- Context
- 128K
- License
- mit
- VRAM Q4
- 8.4 GB
14B model trained primarily on synthetic data. Punches above its weight on reasoning, especially MATH and GPQA. MIT licensed. A standout choice when you want strong reasoning quality without paying 70B-tier hardware costs. Phi-4 in particular demonstrated that careful synthetic-data curation can extract frontier-class reasoning from a relatively small dense model.
- Context
- 16K
- License
- mit
- VRAM Q4
- 8.4 GB
Microsoft's flagship small-model demonstration: GPT-3.5-class on academic benchmarks at <4B parameters. The 4K context-window variant is the lightest; a 128K variant ships separately. MIT licensed, well-suited to on-device assistants and structured-extraction workloads where compactness matters more than absolute quality.
- Context
- 4K
- License
- mit
- VRAM Q4
- 2.3 GB