DeepSeek Coder V2

DeepSeek · DeepSeek236B70B+DeepSeek License code

Coding-focused MoE model with 21B active parameters out of 236B total. Supports 338 programming languages with strong performance across mainstream stacks (Python, TypeScript, Go, Rust, Java, C++) and competent results on niche languages where most open models falter. The DeepSeek licence applies — commercial use permitted with some application restrictions.

Parameters: 236B
Context length: 128K
Modality: text
Released: 2024-06-17

Memory & hardware

VRAM (fp16): 472 GB
VRAM (Q4): 141.6 GB
Recommended: 4× A100 80GB or 2× H100
Quantizations: fp16, q8_0, q4_k_m

License: DeepSeek License

SPDX: —
Commercial use: Yes
Modification: Yes
Redistribution: Yes

License detail →

Benchmarks

HumanEval

90.2

MMLU

79.2

MATH

75.7

BigCodeBench

33.7

LiveCodeBench

24.3

Benchmarks last verified 2026-07-02.

Hosted inference pricing

USD per million tokens.

Provider	Input	Output
deepinfraCheapest	$0.14	$0.28

Pricing last verified 2026-05-18. Providers update rates frequently; confirm before integrating.

Run it yourself

Drop-in commands for the three most common open-source inference paths. The Ollama tag is a best-effort match against the registry; verify the size variant before pulling.

Run DeepSeek Coder V2 locally

No official Ollama registry tag for this model — use transformers or vLLM below.

vLLM (production)

vllm serve deepseek-ai/DeepSeek-Coder-V2-Instruct

High-throughput hosted inference; one command to expose an OpenAI-compatible HTTP server.

Transformers (Python)

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Instruct")
model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-Coder-V2-Instruct", device_map="auto", torch_dtype="auto"
)

Direct PyTorch usage. Pin a torch / cuda version that matches your GPU.

Hugging Face ID: deepseek-ai/DeepSeek-Coder-V2-Instruct

Related models

Same family or similar size — useful when shopping around.

DeepSeek R1 Distill Llama 70B

70B

R1 reasoning capabilities distilled into a Llama 3.3 70B base. The most accessible way to run R1-class reasoning locally — fits on a single H100 in fp16 or on a 4090 at Q4. Inherits Llama 3's community licence (commercial use under 700M MAU). Great pick for production reasoning workloads where the full R1 is too expensive to host but o1/R1-style quality is required.

Context: 128K
License: llama-3
VRAM Q4: 42 GB

DeepSeek R1

671B

Reasoning model trained with reinforcement learning on top of DeepSeek V3-Base. MIT licence — even the weights are unrestricted, making R1 the most permissively-licensed frontier reasoning model. Generates long internal chains-of-thought before answering, trading latency for accuracy on math, code, and reasoning benchmarks. Distilled variants (e.g. R1 Distill Llama 70B) recover most of the quality at much smaller scales.

Context: 128K
License: mit
VRAM Q4: 402.6 GB

DeepSeek V3

671B

671B-parameter MoE model with 37B active per token. Trained for roughly $5.6M of compute — a landmark in cost-efficient frontier training. Frontier-class quality at a fraction of the cost of the closed proprietary frontier. The DeepSeek licence permits commercial use with limited restrictions on military and unlawful applications. Running V3 yourself requires serious hardware (8× H100 at fp8); most teams will use it via the DeepSeek API or providers like Together.

Context: 128K
License: deepseek
VRAM Q4: 402.6 GB

Qwen 3 235B (A22B)

235B

The flagship Qwen 3 release: a 235B-total MoE with 22B active parameters per token. Competitive with DeepSeek V3 and Llama 4 Maverick on reasoning benchmarks while being smaller total. Apache 2.0 — one of the most permissively licenced frontier-class models.

Context: 128K
License: apache-2-0
VRAM Q4: 141 GB

Grok 2

300B

xAI's second open-weights release, Apache 2.0. ~300B mixture-of-experts. xAI's pattern of open-sourcing the previous frontier when a new one ships continues from Grok 1. Competitive with GPT-4-class chat quality at release; today useful mainly as a research artefact given the compute needed to run it.

Context: 131K
License: apache-2-0
VRAM Q4: 180 GB

Grok 1

314B

xAI's first open-weights release: a 314B-parameter mixture-of-experts model. Apache 2.0 licensed. Largely a research artefact at this size — most users will run smaller models for production — but useful as a permissively-licensed reference for MoE research.

Context: 8K
License: apache-2-0
VRAM Q4: 188.4 GB