Microsoft

Phi

Microsoft's small-language-model line, focused on data-quality-driven training. Phi-4 punches well above its weight class.

Visit homepage ↗

History & context

Microsoft's Phi line is built on a thesis: that careful curation of training data can extract frontier-class quality from much smaller models. Phi-1 (June 2023) was a 1.3B Python coder; Phi-2 (December 2023) was a 2.7B generalist; Phi-3 (April 2024) shipped at 3.8B (Mini), 14B (Medium) sizes.

Phi-4 (December 2024) was the standout: a 14B model trained primarily on synthetic data, with reasoning and math performance that exceeds models several times its size. It demonstrated that synthetic-data curation at scale could close the gap between small-data and large-data training regimes.

All Phi releases (3 and 4) ship under MIT — fully permissive open source. They're the most permissively-licensed strong-quality small models available.

Flagship model

Phi-3 Medium 14B

14B

Phi-3's mid-tier model with extended 128K context. MIT licence. Strong reasoning relative to its parameter count thanks to Microsoft's heavy investment in synthetic training data.

Context: 128K
License: mit
VRAM Q4: 8.4 GB

3 models in this family

Phi-3 Medium 14B

14B

Phi-3's mid-tier model with extended 128K context. MIT licence. Strong reasoning relative to its parameter count thanks to Microsoft's heavy investment in synthetic training data.

Context: 128K
License: mit
VRAM Q4: 8.4 GB

Phi-4 14B

14B

14B model trained primarily on synthetic data. Punches above its weight on reasoning, especially MATH and GPQA. MIT licensed. A standout choice when you want strong reasoning quality without paying 70B-tier hardware costs. Phi-4 in particular demonstrated that careful synthetic-data curation can extract frontier-class reasoning from a relatively small dense model.

Context: 16K
License: mit
VRAM Q4: 8.4 GB

Phi-3 Mini 4K Instruct

3.8B

Microsoft's flagship small-model demonstration: GPT-3.5-class on academic benchmarks at <4B parameters. The 4K context-window variant is the lightest; a 128K variant ships separately. MIT licensed, well-suited to on-device assistants and structured-extraction workloads where compactness matters more than absolute quality.

Context: 4K
License: mit
VRAM Q4: 2.3 GB

Comparing Phi against another family? Try the side-by-side comparator or browse all leaderboards.