1 of 40 open-source models (filtered).
Hybrid Mamba-Transformer-MoE model with native 256K context (effective beyond 140K). 94B active parameters out of 398B total. The state-space-model layers give it linear-time scaling with sequence length, making it interesting for very long contexts. Licensed under AI21's open model licence, which permits most commercial use.