Frequently asked questions
The questions we get asked most about open-source AI models. For deeper definitions of technical terms, see the glossary.
What does 'open-source AI model' actually mean?
Strictly speaking, OSI 'open source' means no field-of-use restrictions. By that definition, only Apache 2.0 / MIT models (Mistral, DeepSeek R1, Phi-4, OLMo) qualify. Most other 'open' models — Llama 3, Gemma, Qwen 72B — are more accurately 'open weights' or 'source-available': you can download and modify the model, but their licences forbid certain uses or have monthly-active-user caps. We track all of them but mark the distinction on each licence page.
Do I need a GPU to run these models?
For models above 7B parameters, practically yes. Sub-4B models (Llama 3.2 1B/3B, Phi-3 Mini, Gemma 2 2B) run usably on modern CPUs and Apple Silicon. 7B–13B models work on consumer GPUs with 12–24 GB VRAM at Q4 quantization. 30B+ comfortably runs on a single 4090 only at Q4; fp16 needs an H100 or multiple GPUs.
What's the difference between Llama 3.3 70B and Llama 3.1 405B?
Llama 3.3 70B is Meta's December 2024 refresh that closes most of the gap with the much larger 3.1 405B at one-fifth the parameter count. For most workloads, 3.3 70B is the better choice — it's nearly as good and dramatically cheaper to host. The 405B remains relevant for the most demanding reasoning and long-context tasks.
Which open-source model is best overall right now?
There isn't a single answer. For frontier reasoning, DeepSeek R1 (or its distilled variants if you can't run 671B yourself). For 70B-tier production chat, Llama 3.3 70B or Qwen 2.5 72B. For coding, Qwen 2.5 Coder 32B or DeepSeek Coder V2. For local inference, Llama 3.1 8B or Qwen 2.5 7B. See our use-case picks for the full breakdown.
What's quantization and which level should I use?
Quantization stores weights at lower precision to save memory. The mainstream local-inference default is Q4_K_M: 4 bits per weight, 4× smaller than fp16, with roughly 1–3% quality loss on most benchmarks. Q5_K_M is a slightly higher-quality option. Q8_0 is near-lossless but takes 2× the memory of Q4. Production hosted inference typically uses fp8 or fp16.
How accurate are the benchmark numbers on this site?
Scores are taken from each model's official Hugging Face card, paper, or release blog post. They're useful for cross-model comparison but should be treated as guidance, not gospel — labs use slightly different evaluation harnesses and prompt formats, so direct comparisons across families can over-state real-world differences. Each benchmark row carries a 'last verified' date.
Why is VRAM higher than 'parameters × 2 bytes'?
The headline VRAM figures on this site are weight-storage estimates: params × 2 for fp16, params × 0.6 for Q4. Real inference also needs memory for the KV cache (grows with context length), activations during forward pass, and CUDA / system overhead. Plan for 1.5–2× the headline figure at long contexts.
Can I use these models commercially?
Depends on the licence. Apache 2.0 and MIT models are unconditional. Llama 3, Gemma, Qwen 72B and DeepSeek allow commercial use with some restrictions (Llama's 700M-MAU cap; usage policies forbidding certain applications). Command R / R+ open weights are non-commercial — production deployment requires Cohere's hosted API. Check the licence detail page for each model.
Where do I actually run these models?
For local inference: Ollama (easiest), LM Studio (GUI), or llama.cpp directly. For self-hosted production: vLLM, TGI, or SGLang. For hosted serverless: Together, DeepInfra, Fireworks, Groq, Replicate. Every model detail page on this site lists 'Run it yourself' commands and links to providers that host that specific model.
How often is this directory updated?
Model entries are reviewed manually as new releases ship — typically within a week or two of a major release. Inference pricing is volatile and is re-verified periodically; each pricing row carries a 'last verified' date. If you spot stale data, the site is open-source: contributions welcome.
Have a question that should be on this list? The site source is at github.com/bryanflowers/opensourceaimodels. — Open Source AI Models.