Comparison
DeepSeek V3 vs DeepSeek R1
Side-by-side specs, benchmarks and hosted-inference pricing.
671B-parameter MoE model with 37B active per token. Trained for roughly $5.6M of compute — a landmark in cost-efficient frontier training. Frontier-class quality at a fraction of the cost of the closed proprietary frontier. The DeepSeek licence permits commercial use with limited restrictions on military and unlawful applications. Running V3 yourself requires serious hardware (8× H100 at fp8); most teams will use it via the DeepSeek API or providers like Together.
Reasoning model trained with reinforcement learning on top of DeepSeek V3-Base. MIT licence — even the weights are unrestricted, making R1 the most permissively-licensed frontier reasoning model. Generates long internal chains-of-thought before answering, trading latency for accuracy on math, code, and reasoning benchmarks. Distilled variants (e.g. R1 Distill Llama 70B) recover most of the quality at much smaller scales.
Specs
| Parameters | 671B | 671B |
| Context length | 128K | 128K |
| Modality | text | text |
| Released | 2024-12-26 | 2025-01-20 |
| License | DeepSeek License | MIT |
| Commercial use | Yes | Yes |
| VRAM fp16 | 1342 GB | 1342 GB |
| VRAM Q4 | 402.6 GB | 402.6 GB |
Benchmarks
| GPQA | — | 71.5 |
| HumanEval | 82.6 | — |
| MATH | 84.0 | 97.3 |
| MMLU | 88.5 | 90.8 |
| MMLU-Pro | 75.9 | 84.0 |