58 foundation models reviewed by our six-persona AI panel — benchmarks, real token pricing, context windows, and the verdicts vendors won't give you.
58 of 58 models
Best for hardest long-horizon work money can buy
Best for production agentic coding workhorse
Best for agentic coding at the frontier
Best for best price-to-capability in the OpenAI lineup
Best for best value production workhorse
Best for frontier-adjacent quality at the lowest cost in market
Best for default production workhorse
Best for frontier agentic coding and computer use
Best for frontier reasoning + long-context on Google Cloud
Best for prior frontier, stable production target
Best for frontier-grade agentic coding at open-weights cost
Best for production agent + coding backbone
Best for best price-to-capability open multimodal model
Best for frontier-adjacent reasoning at open-weight prices
Best for high-volume low-latency worker
Best for open-weight agentic coding at sub-frontier price
Best for canonical self-hosted code model
Best for best single-GPU open-weight model in production
Best for open-weights math/reasoning at GA stability
Best for cheapest viable frontier-family worker tier
Best for the Opus price reset, stable target
Best for exposed-CoT reasoning at a fraction of o-series cost
Best for EU-sovereign open-weight frontier generalist
Best for best small open model for on-device reasoning
Best for best open-weight VLM for document AI
Best for best $/intelligence in Google's lineup
Best for best open-weight model in the 13-15B band
Best for the model that put DeepSeek on the production-agent map
Best for frontier deep-research and hardest reasoning
Best for cheapest GA model with 1M context
Best for real-time research and agentic work at frontier-class value
Best for self-hosted multimodal workhorse with no vendor lock-in
Best for cheap multilingual multimodal chat at volume
Best for low-latency code completion and FIM
Best for cost-effective long-context Pro
Best for stable legacy workhorse
Best for single-GPU long-context open-weights deploy
Best for fast multilingual edge model with vision
Best for mature multilingual open-weight workhorse
Best for budget open-weight agentic coding
Best for on-device and high-volume open-weights workhorse
Best for operationally-mature text-only open default
Best for open-weights OCR and document understanding at the edge
Best for auditable multilingual reasoning
Best for mature Apache-2.0 single-GPU workhorse
Best for on-device AI with native vision
Best for single-call parallel deep research with live X data
Best for runtime reasoning toggle with long context and live X data
Best for open-weight always-on reasoning at 32B
Best for escalation tier for in-flight GPT-5.4 stacks
Best for mature mid-tier with thinking toggle
Best for local-first coding agent for IP-sensitive teams
Best for 70B base checkpoint for continued pretraining
Best for legacy flagship on a migration path
Best for OpenAI cost floor for plumbing-grade work
Best for largest open base for from-scratch fine-tuning
Best for superseded legacy mini, migrate to GPT-5.4 nano
Best for legacy snapshot, maintenance only
Every version of a model line, compared in one place.