OSAIM
Open Source AI Models

Best for Edge

Best open-source AI models for edge / on-device inference

Edge inference is where the model meets the user: no network round-trip, no server bills, no data leaving the device. The current crop of sub-4B models is good enough for many real workloads.

What we optimise for

We're optimising for VRAM under 4 GB at Q4, CPU performance, and broad ecosystem support (Ollama, llama.cpp, MLX, NPU-accelerated runtimes).

Why it matters

On-device inference enables fundamentally different products: offline assistants, latency-sensitive UX, private inference with zero cloud dependency.

Our picks

  1. Runs on a modern smartphone. The de facto on-device default.

  2. Step up in quality; still fits comfortably on a laptop or recent phone.

  3. GPT-3.5-class on academic benchmarks at <4B params.

  4. Compact Gemma. Strong default if Llama community licence doesn't fit.

Things to watch out for

  • Smaller models hallucinate more. Pair them with strict structured-output formats (JSON schema) when correctness matters.
  • On-device fine-tuning is now practical with LoRA at this scale — consider personalising a 1–3B model to your user's data on-device.
  • Battery and thermal budget matter as much as raw model quality on mobile.

All picks at a glance

Last reviewed 2026-06-08. We refresh these picks as new models ship. See the full directory at /models.