Comparison

DeepSeek V3 vs DeepSeek R1

Side-by-side specs, benchmarks and hosted-inference pricing.

Side A

DeepSeek · DeepSeek

671B-parameter MoE model with 37B active per token. Trained for roughly $5.6M of compute — a landmark in cost-efficient frontier training. Frontier-class quality at a fraction of the cost of the closed proprietary frontier. The DeepSeek licence permits commercial use with limited restrictions on military and unlawful applications. Running V3 yourself requires serious hardware (8× H100 at fp8); most teams will use it via the DeepSeek API or providers like Together.

Side B

DeepSeek R1

DeepSeek · DeepSeek

Reasoning model trained with reinforcement learning on top of DeepSeek V3-Base. MIT licence — even the weights are unrestricted, making R1 the most permissively-licensed frontier reasoning model. Generates long internal chains-of-thought before answering, trading latency for accuracy on math, code, and reasoning benchmarks. Distilled variants (e.g. R1 Distill Llama 70B) recover most of the quality at much smaller scales.

Specs

Parameters	671B	671B
Context length	128K	128K
Modality	text	text
Released	2024-12-26	2025-01-20
License	DeepSeek License	MIT
Commercial use	Yes	Yes
VRAM fp16	1342 GB	1342 GB
VRAM Q4	402.6 GB	402.6 GB

Benchmarks

ArenaHard	85.5	92.3
BFCL	82.5	—
GPQA	—	71.5
HumanEval	82.6	—
IFEval	86.1	83.3
MATH	84.0	97.3
MMLU	88.5	90.8
MMLU-Pro	75.9	84.0
SWE-bench Verified	—	49.2

Cheapest hosted pricing

DeepSeek V3

deepinfra: $0.49 in / $0.89 out per 1M tokens

DeepSeek R1

deepinfra: $0.55 in / $2.19 out per 1M tokens

Highlighted cells indicate the better value for that row (higher score, larger context, lower VRAM).