OLMo 2 13B
Larger OLMo 2 release. Same fully-open philosophy as the 7B variant. The 13B size makes it more competitive with mainstream production-grade chat models.
- Parameters
- 13B
- Context length
- 4K
- Modality
- text
- Released
- 2024-11-26
Memory & hardware
- VRAM (fp16)
- 26 GB
- VRAM (Q4)
- 7.8 GB
- Recommended
- RTX 4090 24GB
- Quantizations
- fp16, q8_0
Benchmarks
Hosted inference pricing
No hosted pricing listed — this model is currently self-host-only on this site.
Run it yourself
Drop-in commands for the three most common open-source inference paths. The Ollama tag is a best-effort match against the registry; verify the size variant before pulling.
vllm serve allenai/OLMo-2-1124-13B
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-13B")
model = AutoModelForCausalLM.from_pretrained(
"allenai/OLMo-2-1124-13B", device_map="auto", torch_dtype="auto"
)allenai/OLMo-2-1124-13B Related models
Same family or similar size — useful when shopping around.
Phi-3's mid-tier model with extended 128K context. MIT licence. Strong reasoning relative to its parameter count thanks to Microsoft's heavy investment in synthetic training data.
- Context
- 128K
- License
- mit
- VRAM Q4
- 8.4 GB
14B model trained primarily on synthetic data. Punches above its weight on reasoning, especially MATH and GPQA. MIT licensed. A standout choice when you want strong reasoning quality without paying 70B-tier hardware costs. Phi-4 in particular demonstrated that careful synthetic-data curation can extract frontier-class reasoning from a relatively small dense model.
- Context
- 16K
- License
- mit
- VRAM Q4
- 8.4 GB
Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.
- Context
- 128K
- License
- apache-2-0
- VRAM Q4
- 8.4 GB
Fully-open 7B model: weights, training data and code all released under permissive licences. Useful as a reference for reproducibility research and for teams that need full transparency on training data provenance.
- Context
- 4K
- License
- apache-2-0
- VRAM Q4
- 4.2 GB
24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.
- Context
- 33K
- License
- apache-2-0
- VRAM Q4
- 14.4 GB
Flagship Gemma 2 release. Uses logit-distillation from a larger teacher model, which is how Google delivers near-70B quality from a 27B student. A solid choice when the Llama community licence doesn't fit and you need quality at the 27B–40B size range.
- Context
- 8K
- License
- gemma
- VRAM Q4
- 16.2 GB