| Model | Score | Status | Released | Price in/out | Context | |
|---|---|---|---|---|---|---|
| Llama 4 Maverick self-hosted multimodal workhorse with no vendor lock-in | 7.7 | GA | 2025-04-04 | $0.20 / $0.85 | 1M | Review → |
| Llama 4 Scout single-GPU long-context open-weights deploy | 7.5 | GA | 2025-04-04 | $0.10 / $0.34 | 10M | Review → |
| Llama 3.3 70B operationally-mature text-only open default | 7.4 | GA | 2024-12-05 | $0.12 / $0.40 | 128K | Review → |
| Llama 3.1 8B on-device and high-volume open-weights workhorse | 7.4 | GA | 2024-07-22 | $0.05 / $0.08 | 128K | Review → |
| Llama 3.1 70B 70B base checkpoint for continued pretraining | 6.7 | GA | 2024-07-22 | $0.40 / $0.59 | 128K | Review → |
| Llama 3.1 405B largest open base for from-scratch fine-tuning | 6.3 | GA | 2024-07-22 | $3 / $3 | 128K | Review → |
Llama 4 Maverick is Meta's flagship open-weights model, released April 5, 2025 as the first Llama to ship as a Mixture-of-Experts. It pairs a 400-billion-parameter knowledge pool with only 17 billion active parameters per token, is natively multimodal (text + image from pre-training), and serves a 1M-token context window. The one-sentence buyer takeaway: it is not the smartest model on any leaderboard, but it is the strongest open-weights workhorse you can self-host and run on every major inference provider for roughly a tenth the price of a closed frontier API — making it the default hedge against vendor lock-in. - Provider: Meta - Release: 2025-04-05 (GA, open weights) - Status: GA, latest in its tier (no successor shipped as of May 2026) - Context: 1,000,000 tokens (256K native pre-training, extended via iRoPE) - Max output: 8,192 tokens (provider-dependent) - Modalities: text + image in, text out - Knowledge cutoff: August 2024 - Headline price: ~$0.20 in / ~$0.85 out per 1M tokens (representative across providers)
Full review →Llama 4 Scout is the small, deployable member of Meta's Llama 4 herd, released April 5, 2025. It is a 109B-total / 17B-active Mixture-of-Experts model (16 experts), natively multimodal, with a headline 10,000,000-token context window — the largest of any openly available model at release. The one-sentence buyer takeaway: it is the only model in 2026 that combines a 10M context, native vision, and single-GPU deployability, making it the obvious open-weights pick when context size and on-prem economics matter more than peak intelligence. - Provider: Meta - Release: 2025-04-05 (GA, open weights) - Status: GA, latest in its tier (no successor shipped as of May 2026) - Context: 10,000,000 tokens (256K native pre-training, extended via iRoPE) - Max output: 8,192 tokens (provider-dependent) - Modalities: text + image in, text out - Knowledge cutoff: August 2024 - Headline price: ~$0.08–$0.11 in / ~$0.30–$0.34 out per 1M tokens
Full review →