OSAIM
Open Source AI Models

Jamba 1.5 Large

Hybrid Mamba-Transformer-MoE model with native 256K context (effective beyond 140K). 94B active parameters out of 398B total. The state-space-model layers give it linear-time scaling with sequence length, making it interesting for very long contexts.

Licensed under AI21's open model licence, which permits most commercial use.

Parameters
398B
Context length
256K
Modality
text
Released
2024-08-22

Memory & hardware

VRAM (fp16)
796 GB
VRAM (Q4)
238.8 GB
Recommended
8× H100 80GB
Quantizations
fp16, q8_0

License: Jamba Open Model License

SPDX
Commercial use
Yes
Modification
Yes
Redistribution
Yes

Benchmarks

MMLU
81.2
HumanEval
71.3
Benchmarks last verified 2026-05-18.

Hosted inference pricing

USD per million tokens.

ProviderInputOutput
togetherCheapest$2.00$8.00
Pricing last verified 2026-05-18. Providers update rates frequently; confirm before integrating.

Run it yourself

Drop-in commands for the three most common open-source inference paths. The Ollama tag is a best-effort match against the registry; verify the size variant before pulling.

Run Jamba 1.5 Large locally
No official Ollama registry tag for this model — use transformers or vLLM below.
vLLM (production)
vllm serve ai21labs/AI21-Jamba-1.5-Large
High-throughput hosted inference; one command to expose an OpenAI-compatible HTTP server.
Transformers (Python)
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ai21labs/AI21-Jamba-1.5-Large")
model = AutoModelForCausalLM.from_pretrained(
    "ai21labs/AI21-Jamba-1.5-Large", device_map="auto", torch_dtype="auto"
)
Direct PyTorch usage. Pin a torch / cuda version that matches your GPU.
Hugging Face ID: ai21labs/AI21-Jamba-1.5-Large

Related models

Same family or similar size — useful when shopping around.

Nemotron-4 340B Instruct
340B

NVIDIA's reward-modelling research vehicle. Trained primarily to be a synthetic-data-generation specialist rather than a chat-first model. Useful for teams building instruction-tuning datasets at scale.

Context
4K
License
llama-3
VRAM Q4
204 GB
Grok 1
314B

xAI's first open-weights release: a 314B-parameter mixture-of-experts model. Apache 2.0 licensed. Largely a research artefact at this size — most users will run smaller models for production — but useful as a permissively-licensed reference for MoE research.

Context
8K
License
apache-2-0
VRAM Q4
188.4 GB
DeepSeek Coder V2
236B

Coding-focused MoE model with 21B active parameters out of 236B total. Supports 338 programming languages with strong performance across mainstream stacks (Python, TypeScript, Go, Rust, Java, C++) and competent results on niche languages where most open models falter. The DeepSeek licence applies — commercial use permitted with some application restrictions.

Context
128K
License
deepseek
VRAM Q4
141.6 GB
Mixtral 8×22B Instruct
141B

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Context
66K
License
apache-2-0
VRAM Q4
84.6 GB
DeepSeek V3
671B

671B-parameter MoE model with 37B active per token. Trained for roughly $5.6M of compute — a landmark in cost-efficient frontier training. Frontier-class quality at a fraction of the cost of the closed proprietary frontier. The DeepSeek licence permits commercial use with limited restrictions on military and unlawful applications. Running V3 yourself requires serious hardware (8× H100 at fp8); most teams will use it via the DeepSeek API or providers like Together.

Context
128K
License
deepseek
VRAM Q4
402.6 GB
DeepSeek R1
671B

Reasoning model trained with reinforcement learning on top of DeepSeek V3-Base. MIT licence — even the weights are unrestricted, making R1 the most permissively-licensed frontier reasoning model. Generates long internal chains-of-thought before answering, trading latency for accuracy on math, code, and reasoning benchmarks. Distilled variants (e.g. R1 Distill Llama 70B) recover most of the quality at much smaller scales.

Context
128K
License
mit
VRAM Q4
402.6 GB