OSAIM
Open Source AI Models

All models

16 of 40 open-source models (filtered).

Grok 1
314B

xAI's first open-weights release: a 314B-parameter mixture-of-experts model. Apache 2.0 licensed. Largely a research artefact at this size — most users will run smaller models for production — but useful as a permissively-licensed reference for MoE research.

Context
8K
License
apache-2-0
VRAM Q4
188.4 GB
Mixtral 8×22B Instruct
141B

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Context
66K
License
apache-2-0
VRAM Q4
84.6 GB
Mixtral 8×7B Instruct
46.7B

The mixture-of-experts release that introduced 8 experts of 7B each, 2 active per token. ~13B active parameters with 47B total, which makes per-token inference roughly as fast as a 13B dense model while approaching 70B dense quality. Apache 2.0 weights mean it's still a popular self-hosting choice. Memory footprint is the main constraint — the full 47B parameters must be loaded even though only a quarter are active per token.

Context
33K
License
apache-2-0
VRAM Q4
28 GB
Yi VL 34B
34B

Vision-language variant of Yi 34B. Image-text reasoning via an MLP adapter on a CLIP encoder. Useful for bilingual EN/中 multimodal workloads where the major Western vision-language models underperform on Chinese text in images.

Context
4K
License
apache-2-0
VRAM Q4
20.4 GB
Yi 1.5 34B Chat
34B

Bilingual EN/中 34B chat model. Apache 2.0 licensed with strong Chinese-language performance and competitive English chat quality. Good default for bilingual production workloads.

Context
33K
License
apache-2-0
VRAM Q4
20.4 GB
QwQ 32B Preview
32B

Qwen's reasoning-focused 'thinking' model. Generates long chains-of-thought before answering, similar to OpenAI's o1 and DeepSeek R1 lineage. Optimised for math and competition-style problem solving. The Preview tag means Qwen is iterating quickly; later versions may obsolete this one. Useful today for math-heavy workloads where a slow, careful answer is preferred to a fast wrong one.

Context
33K
License
apache-2-0
VRAM Q4
19.2 GB
Qwen2.5 32B Instruct
32B

32B sweet-spot model: strong reasoning, fits on one H100 in fp16, on a 4090 at Q4. The 32B size in particular hits a quality/cost knee — quality scales with parameters faster than cost up to ~32B, and slower afterwards. Favoured for production chat where 7B isn't sharp enough and where 70B+ would over-spec the hardware budget. Apache 2.0 licence.

Context
128K
License
apache-2-0
VRAM Q4
19.2 GB
Qwen2.5 Coder 32B
32B

Coding-specialised Qwen2.5 32B fine-tune. GPT-4o-class on HumanEval and BigCodeBench at the time of release. Trained on additional code-heavy data with extended pre-training. Apache 2.0. Natural pick for self-hosted coding assistants, code-review automation, and any agent loop that primarily writes code.

Context
128K
License
apache-2-0
VRAM Q4
19.2 GB
Mistral Small 3
24B

24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.

Context
33K
License
apache-2-0
VRAM Q4
14.4 GB
Qwen2.5 14B Instruct
14B

Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.

Context
128K
License
apache-2-0
VRAM Q4
8.4 GB
OLMo 2 13B
13B

Larger OLMo 2 release. Same fully-open philosophy as the 7B variant. The 13B size makes it more competitive with mainstream production-grade chat models.

Context
4K
License
apache-2-0
VRAM Q4
7.8 GB
Mistral Nemo 12B
12B

Joint Mistral × NVIDIA model with 128K context, designed as a drop-in upgrade to Mistral 7B. Trained with NVIDIA's Megatron stack and released under Apache 2.0. Strong multilingual coverage thanks to the Tekken tokenizer.

Context
128K
License
apache-2-0
VRAM Q4
7.2 GB
Stable LM 2 12B
12B

Stability AI's general-purpose 12B model. Apache 2.0. Useful default when you need a permissively-licensed 12B-scale model.

Context
4K
License
apache-2-0
VRAM Q4
7.2 GB
OLMo 2 7B
7B

Fully-open 7B model: weights, training data and code all released under permissive licences. Useful as a reference for reproducibility research and for teams that need full transparency on training data provenance.

Context
4K
License
apache-2-0
VRAM Q4
4.2 GB
Qwen2.5 7B Instruct
7B

Apache-2.0-licensed 7B model with surprisingly strong reasoning and multilingual chops. Qwen 2.5 trains on a larger and more carefully filtered corpus than the original Qwen series, and the 7B variant punches well above its weight on coding and math benchmarks. A strong default for cost-sensitive chat workloads and for fine-tuning experiments where the Apache licence simplifies downstream redistribution.

Context
128K
License
apache-2-0
VRAM Q4
4.2 GB
Mistral 7B v0.3
7B

The original Mistral 7B refresh with 32K context and extended vocabulary. Permissive Apache 2.0 weights and the first widely-deployed sliding-window-attention model. Still useful in 2026 for very-low-cost inference and as a baseline for fine-tuning experiments.

Context
33K
License
apache-2-0
VRAM Q4
4.2 GB