OSAIM
Open Source AI Models

All models

8 of 40 open-source models (filtered).

Mixtral 8×22B Instruct
141B

Scaled-up Mixtral with 22B-parameter experts. ~39B active parameters out of 141B total. Strong long-context performance and competitive coding scores. Apache 2.0 makes it attractive for self-hosting where the licence terms of Llama 3 are a non-starter.

Context
66K
License
apache-2-0
VRAM Q4
84.6 GB
Mixtral 8×7B Instruct
46.7B

The mixture-of-experts release that introduced 8 experts of 7B each, 2 active per token. ~13B active parameters with 47B total, which makes per-token inference roughly as fast as a 13B dense model while approaching 70B dense quality. Apache 2.0 weights mean it's still a popular self-hosting choice. Memory footprint is the main constraint — the full 47B parameters must be loaded even though only a quarter are active per token.

Context
33K
License
apache-2-0
VRAM Q4
28 GB
Qwen2.5 32B Instruct
32B

32B sweet-spot model: strong reasoning, fits on one H100 in fp16, on a 4090 at Q4. The 32B size in particular hits a quality/cost knee — quality scales with parameters faster than cost up to ~32B, and slower afterwards. Favoured for production chat where 7B isn't sharp enough and where 70B+ would over-spec the hardware budget. Apache 2.0 licence.

Context
128K
License
apache-2-0
VRAM Q4
19.2 GB
Qwen2.5 Coder 32B
32B

Coding-specialised Qwen2.5 32B fine-tune. GPT-4o-class on HumanEval and BigCodeBench at the time of release. Trained on additional code-heavy data with extended pre-training. Apache 2.0. Natural pick for self-hosted coding assistants, code-review automation, and any agent loop that primarily writes code.

Context
128K
License
apache-2-0
VRAM Q4
19.2 GB
Mistral Small 3
24B

24B dense model from Mistral's January 2025 release that competes with Llama 3.3 70B on many tasks at a third of the parameter count. Apache 2.0 licensed and small enough to run on a single 4090 at Q4. Good pick when you want Llama-3.3-70B-class chat quality but at a friendlier hardware budget, or when the licence matters and Llama's community terms don't fit.

Context
33K
License
apache-2-0
VRAM Q4
14.4 GB
Qwen2.5 14B Instruct
14B

Mid-size Qwen2.5 with broad task coverage. The sweet spot for users who want noticeably better quality than 7B but can't justify the hardware footprint of 32B or 72B.

Context
128K
License
apache-2-0
VRAM Q4
8.4 GB
Mistral Nemo 12B
12B

Joint Mistral × NVIDIA model with 128K context, designed as a drop-in upgrade to Mistral 7B. Trained with NVIDIA's Megatron stack and released under Apache 2.0. Strong multilingual coverage thanks to the Tekken tokenizer.

Context
128K
License
apache-2-0
VRAM Q4
7.2 GB
Qwen2.5 7B Instruct
7B

Apache-2.0-licensed 7B model with surprisingly strong reasoning and multilingual chops. Qwen 2.5 trains on a larger and more carefully filtered corpus than the original Qwen series, and the 7B variant punches well above its weight on coding and math benchmarks. A strong default for cost-sensitive chat workloads and for fine-tuning experiments where the Apache licence simplifies downstream redistribution.

Context
128K
License
apache-2-0
VRAM Q4
4.2 GB