DeepSeek open-source AI models · Open Source AI Models

DeepSeek

Hangzhou-based lab known for highly efficient MoE training. DeepSeek V3 and R1 set new bars for open reasoning and coding.

Visit homepage ↗

History & context

DeepSeek's series of releases through 2024 and into 2025 changed the economics of frontier AI. The Hangzhou-based lab specialises in mixture-of-experts training: their architecture and training pipeline lets them train frontier-class models at a fraction of the compute cost of dense-model labs.

DeepSeek V3 (December 2024) — a 671B-parameter MoE with 37B active per token — was reportedly trained for around $5.6M of compute. Quality on academic benchmarks rivals the closed proprietary frontier. DeepSeek R1 (January 2025) followed with reinforcement-learning-trained reasoning capability — and uniquely, the weights are MIT-licensed.

R1 also shipped a family of distilled variants (R1 Distill Qwen 7B / 14B / 32B, R1 Distill Llama 8B / 70B) that recover most of R1's reasoning quality at much smaller scales. The 70B distill is the most practical way to run R1-class reasoning on a single H100.

Flagship model

DeepSeek R1

671B

Reasoning model trained with reinforcement learning on top of DeepSeek V3-Base. MIT licence — even the weights are unrestricted, making R1 the most permissively-licensed frontier reasoning model. Generates long internal chains-of-thought before answering, trading latency for accuracy on math, code, and reasoning benchmarks. Distilled variants (e.g. R1 Distill Llama 70B) recover most of the quality at much smaller scales.

Context: 128K
License: mit
VRAM Q4: 402.6 GB

4 models in this family

DeepSeek R1

671B

Context: 128K
License: mit
VRAM Q4: 402.6 GB

DeepSeek V3

671B

671B-parameter MoE model with 37B active per token. Trained for roughly $5.6M of compute — a landmark in cost-efficient frontier training. Frontier-class quality at a fraction of the cost of the closed proprietary frontier. The DeepSeek licence permits commercial use with limited restrictions on military and unlawful applications. Running V3 yourself requires serious hardware (8× H100 at fp8); most teams will use it via the DeepSeek API or providers like Together.

Context: 128K
License: deepseek
VRAM Q4: 402.6 GB

DeepSeek Coder V2

236B

Coding-focused MoE model with 21B active parameters out of 236B total. Supports 338 programming languages with strong performance across mainstream stacks (Python, TypeScript, Go, Rust, Java, C++) and competent results on niche languages where most open models falter. The DeepSeek licence applies — commercial use permitted with some application restrictions.

Context: 128K
License: deepseek
VRAM Q4: 141.6 GB

DeepSeek R1 Distill Llama 70B

70B

R1 reasoning capabilities distilled into a Llama 3.3 70B base. The most accessible way to run R1-class reasoning locally — fits on a single H100 in fp16 or on a 4090 at Q4. Inherits Llama 3's community licence (commercial use under 700M MAU). Great pick for production reasoning workloads where the full R1 is too expensive to host but o1/R1-style quality is required.

Context: 128K
License: llama-3
VRAM Q4: 42 GB

Comparing DeepSeek against another family? Try the side-by-side comparator or browse all leaderboards.