All models

4 of 55 open-source models (filtered).

Llama 3 Community License ✕Code ✕Clear all

Sort:

Meta's July 2024 flagship — the first open-weights model at 405B parameters. Trained on 15T tokens with 128K context. Rivals GPT-4o on many academic benchmarks and set the ceiling for open-weights quality for most of 2024. Running it self-hosted requires serious hardware (8× H100 at fp8 or multi-node at fp16); most users will run it via a hosted provider (Together, Groq, Fireworks). Llama 3.3 70B closed most of the practical gap at a fraction of the cost, so 405B is now most useful when 70B specifically hits its ceiling.

Context: 128K
License: llama-3
VRAM Q4: 243 GB

Hermes 3 Llama 3.1 70B

70B

Larger Hermes 3 variant on top of Llama 3.1 70B. Widely used in agent-heavy workloads that need strong tool use combined with reliable function-calling schemas.

Context: 128K
License: llama-3
VRAM Q4: 42 GB

Hermes 3 Llama 3.1 8B

NousResearch's community-driven fine-tune on the Llama 3.1 8B base. Tuned for strong tool use, function calling and steerable persona behaviour. Inherits Llama 3's community licence and its 128K context.

Context: 128K
License: llama-3
VRAM Q4: 4.8 GB

Llama 3.1 8B Instruct

The workhorse 8B instruction-tuned model. Excellent quality-to-cost ratio and the broadest ecosystem support of any open-weights model — every major inference engine, fine-tuning library, and quantization toolchain has a 3.1 8B preset. Fits in 24 GB of VRAM at fp16, ~6 GB at Q4. Strong default for production chat where 70B is overkill, for fine-tuning on a specialist task, and for any workload where you want a known-good baseline.

Context: 128K
License: llama-3
VRAM Q4: 4.8 GB