Open-source observability for LLM and AI applications
Arize Phoenix is an open-source AI observability platform for engineers and data scientists building LLM applications.
AI Panel Score
6 AI reviews
Reviewed
AI Editor ApprovedApproved and published by our AI Editor-in-Chief after full panel analysis.In practice, users instrument their AI application code using Phoenix's auto-instrumentation integrations or manual OpenTelemetry decorators. Traces are collected and visualized in a web UI where developers can inspect individual spans, review token usage, examine latency, and drill into conversation sessions. Prompts can be managed and versioned, datasets can be built from traces, and experiments can be run to compare changes over time.
Phoenix highlights several specific capabilities: per-project trace organization for separating environments and teams, session tracking for multi-turn conversations, human and LLM-based annotation of traces for feedback collection, and an experiments workflow for structured evaluation against datasets. Integrations cover a wide range of frameworks and providers including OpenAI, Anthropic, LangChain, LangGraph, LlamaIndex, LiteLLM, Haystack, Bedrock, VertexAI, MistralAI, Groq, CrewAI, DSPy, Vercel AI SDK, and others across both Python and TypeScript/JavaScript.
Phoenix targets ML engineers, AI engineers, and data scientists working on LLM-powered products. As an open-source project, it can be self-hosted at no cost; Arize also offers a managed cloud version. Competing tools in the AI observability space include LangSmith, Langfuse, Weights & Biases, and Helicone.
For self-hosting, Phoenix runs as a containerized application deployable via Docker or Kubernetes, with support for SQLite or PostgreSQL as the backing database. Authentication can be configured with API keys or OAuth2 identity providers. A hosted cloud option at phoenix.arize.com removes the need for self-managed infrastructure.
Captures feedback from end-users and LLMs by allowing users to annotate traces with labels and scores to facilitate iterative improvements in LLM applications.
Supports creation and management of datasets and experiments for evaluating and benchmarking LLM application performance.
Enables evaluation of model outputs through built-in tooling that supports logging and querying evaluation results for performance tracking.
Tracks and organizes related traces across multi-turn conversations, maintaining context and providing insights into conversation history, token usage, and latency.
Captures execution traces across LLM applications using OpenTelemetry and OpenInference, enabling users to understand execution paths and identify performance bottlenecks.
Supports running Phoenix across cloud settings, local notebooks, and terminal or container environments with environment variable configuration for trace collection.
Organizes LLM traces by segregating data across environments, applications, experiments, and teams to facilitate comparative analysis and collaborative efforts.
Provides tooling for managing and engineering prompts within LLM applications, supporting iterative experimentation and optimization.
Enables running Phoenix as a containerized application supporting both SQLite and PostgreSQL databases, deployable via Docker or Kubernetes.
Allows users to add custom attributes, metadata, user IDs, session IDs, and prompt templates to spans for enhanced instrumentation of trace data.
Supports OpenTelemetry auto-instrumentation for a wide range of frameworks and SDKs including OpenAI, LangChain, LlamaIndex, Anthropic, Bedrock, Groq, CrewAI, DSPy, and others in both Python and JavaScript.
Provides user management, API key configuration, and integration with OAuth2 identity providers to secure access to the Phoenix platform.
Small teams and smaller data — self-hosted open source
Individuals and startups — SaaS
Small teams and startups (startup pricing available) — SaaS
Enterprise — SaaS or Self-Hosted
Open-source LLM observability that ships fast and won't lock you in.
“Phoenix gives AI teams tracing, evals, and prompt management on infrastructure they control. The self-hosted path is free; the SaaS tier starts at $50/month.”
The open-source, self-hosted model is the real story here. No vendor dependency, no span-count anxiety, no negotiating retention windows. Auto-instrumentation covers OpenAI, LangChain, LlamaIndex, Anthropic, and a dozen others out of the box — that's a serious integration surface for a free tier.
The tradeoff is real though. LangSmith has a more polished managed experience and LangGraph-native depth. Phoenix's SaaS free tier caps at 25k spans and 15 days retention — that's tight for anything beyond a prototype. The $50/month Pro plan doubles the spans but won't satisfy a team running production workloads at scale.
Arize is a funded company with an enterprise tier and SOC2 compliance in place. That's a defensible 36-month bet. Pilot the self-hosted build with two engineers. If infra overhead is tolerable, you won't find a better observability stack at this price.
Broader framework coverage than Helicone and a stronger self-host story than LangSmith, though LangSmith still edges it on managed UX polish.
OpenTelemetry and OpenInference standards as the foundation signals technical credibility; the board won't blink at open-source with enterprise backing.
Auto-instrumentation for major frameworks means an engineer can have traces flowing in an afternoon, not a sprint.
LLM tracing and the experiments workflow directly advance teams building production AI products, not just monitoring what already exists.
Arize has an enterprise SaaS tier with SOC2 and HIPAA, suggesting real revenue and institutional buyers — no public funding data but the compliance posture indicates maturity.
AI engineering teams who want production-grade observability without handing span data to a third party.
Your team has no capacity to run containerized infrastructure and needs a fully managed experience on day one.
OpenTelemetry-native LLM observability that CTOs can actually build on without regret.
“Phoenix is an open-source LLM observability platform built on OpenTelemetry and the OpenInference standard. It covers tracing, evaluation, prompt management, and dataset workflows with a self-hosted path that keeps data in your control.”
OpenTelemetry as the instrumentation layer is the right architectural call. That's not a vendor abstraction — it's the CNCF standard, which means Phoenix's trace collection won't become a migration problem if you ever need to swap the backend. The OpenInference semantic conventions on top handle LLM-specific span attributes that vanilla OTel doesn't cover. Someone on this team has thought about lock-in carefully.
The self-hosted path runs Postgres or SQLite behind a containerized app with OAuth2 support. That's a defensible production architecture for regulated environments — though the AX Pro cloud tier caps at 50k spans/month and 10GB ingestion, which is a real constraint for high-volume inference pipelines. LangSmith and Langfuse both offer comparable managed tiers; the differentiation here is the open-source escape hatch.
If we adopt Phoenix today, in 3 years we have a tracing foundation that doesn't require renegotiating vendor contracts as inference volume scales. The experiments and annotation workflows suggest a team thinking about eval-driven development, not just logging. The ceiling is high enough for serious production use.
Competes directly with LangSmith and Langfuse; the open-source self-hosted path is the clearest differentiation in a converging field.
Per-project trace isolation, multi-turn session tracking, and LLM-based annotation map directly to how ML engineers actually instrument production pipelines.
Auto-instrumentation covers OpenAI, Anthropic, LangChain, LlamaIndex, LangGraph, CrewAI, DSPy, Bedrock, VertexAI, Groq, and Vercel AI SDK across Python and TypeScript.
Open-source core with Postgres backing means no forced migration, but Arize's managed cloud pricing tiers create a natural pressure point as span volume grows.
OpenTelemetry plus OpenInference semantic conventions shows architectural discipline beyond most LLM observability point-tools.
ML or AI engineering teams who need production-grade LLM observability and want a self-hosted, open-source foundation that won't create vendor lock-in as their inference stack scales.
Teams running very high inference volumes on the managed cloud tier will outgrow AX Pro quickly and should pressure-test Enterprise pricing before committing.
$0 open-source floor with a $50/month SaaS tier — rare pricing honesty in AI observability
“Three tiers visible without a sales call. Self-hosted Phoenix costs infrastructure, not license fees.”
$0 open-source, $50/month Pro, Enterprise custom. All visible on the pricing page. That's unusual in this category. LangSmith charges per trace at scale and buries the math. Phoenix publishes it.
TCO depends on the deployment path. Self-hosted: $0 license, but factor DevOps time, PostgreSQL hosting, and OAuth2 setup. Call it $8K-$15K/year in engineering overhead at a 50-person team. SaaS Pro at $50/month is $600/year — trivially cheap until you hit the 50K span/month ceiling. Heavy production workloads will exceed that fast. Enterprise pricing is custom, which means a sales call eventually.
The tradeoff: self-hosted gives unlimited retention and volume, but you own the ops. AX Pro's 30-day retention is short for audit or regression work. No published overage rate on the SaaS tiers — that's the one number missing from an otherwise transparent page.
Monthly SaaS tiers and a free open-source option mean procurement friction is low; Enterprise custom pricing is the only procurement-heavy path.
Open-source has no contract; SaaS monthly billing based on pricing page structure implies standard cancellation terms with no visible lock-in.
Four tiers with specific span limits, ingestion caps, and retention windows all on the pricing page — no sales call required.
LLM tracing and experiments workflow give measurable outputs — latency, token usage, eval scores — but ROI quantification is still on the buyer.
Self-hosted path avoids license cost but carries real DevOps overhead; SaaS Pro at $600/year is low until span limits trigger Enterprise conversations.
ML or AI engineering teams that want open-source observability with a clear SaaS upgrade path and no vendor lock-in.
Your production span volume will immediately breach 50K/month and you aren't ready to negotiate Enterprise pricing.
OpenTelemetry-native LLM tracing that engineers can actually self-host and script against
“Phoenix builds on standards engineers already know — OpenTelemetry and OpenInference — so instrumentation doesn't feel alien. Self-hosted, open-source, PostgreSQL-backed: this is infra you can own, not a SaaS you're dependent on.”
Auto-instrumentation covers OpenAI, LangChain, LlamaIndex, LangGraph, CrewAI, DSPy, and a dozen more. Drop in the integration, set an env var, traces appear. That's the day-one experience. Day three is where you're asking: can I query spans programmatically, build datasets from trace subsets, and pipe that into an eval loop? The experiments workflow and dataset-building-from-traces features suggest yes — that's a real power-user path, not a demo feature.
The tradeoff is retention. AX Free caps at 25k spans/month and 15-day retention. Self-hosted open-source removes those limits but puts database management on you — SQLite for local dev, PostgreSQL for anything production. LangSmith charges per trace with no self-host option; Langfuse also open-sources but Phoenix's OTel-native approach means your instrumentation isn't vendor-locked.
OAuth2 support and per-project trace isolation mean this can run in a real team environment, not just a solo notebook. The $50/month AX Pro tier is a reasonable escape valve if you don't want to babysit Postgres.
OTel-based auto-instrumentation survives past the demo; the experiments and annotation workflow gives engineers a real iteration loop beyond just viewing traces.
Docs site is present and the feature set references specific SDK integration patterns and custom span attributes — signals that docs were written closer to the codebase than the marketing site.
Self-hosted PostgreSQL maintenance and the 25k span/month cap on the free SaaS tier are recurring friction points; Docker/Kubernetes deployment docs indicate the setup path is at least well-documented.
Custom span metadata, LLM-based annotation, dataset construction from traces, and structured experiment runs indicate genuine depth beyond basic tracing.
OpenTelemetry decorators and env-var-based configuration fit naturally into existing Python and TypeScript CI workflows without rewiring the stack.
ML or AI engineers who want OTel-native LLM tracing they can self-host and integrate into a real eval pipeline.
You need a fully managed, zero-infra observability solution with long retention and don't want to manage Postgres.
Self-hosted LLM observability that actually respects your infrastructure choices
“Phoenix gives AI engineers real tracing and eval tooling without a mandatory SaaS tax. The open-source path is genuinely free; the $50/month Pro tier is the realistic middle ground for small teams.”
The integration list here is serious. OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, DSPy — auto-instrumented, both Python and TypeScript. That's not a marketing bullet, that's someone actually counting the frameworks their users are running. And OpenTelemetry as the backbone means you're not locked into a Phoenix-specific tracing format, which matters the day you want to swap something out.
The free tier caps at 25k spans and 15 days retention, which runs out faster than you'd expect in active development. LangSmith and Langfuse have similar ceiling problems on free tiers, so that's a category norm, not a Phoenix sin. But the open-source self-hosted path sidesteps all of it — you own the retention, the volume, the database.
The tradeoff nobody talks about: self-hosting buys you freedom but costs you maintenance. Docker and Kubernetes deployments aren't hard, but they're yours now. For solo builders, the AX Free tier is probably enough. For a team shipping to production, you're choosing between $50/month managed or an ops burden you didn't budget for.
Trace visualization with span-level drill-down and session tracking suggests real UI care, but no changelog is publicly visible so it's hard to gauge iteration pace.
OpenTelemetry familiarity helps a lot; the Projects and Sessions feature organization gives new users a clear mental model for structuring their work.
This is a developer observability tool running in a web UI — mobile is almost certainly read-only at best, and nobody building LLM traces is doing it on their phone.
Auto-instrumentation integrations and multi-environment support mean most engineers can get traces flowing without reading deep docs first.
PostgreSQL and SQLite backend support plus Docker/Kubernetes deployment options suggest the team has thought about production-grade reliability, not just demo setups.
AI engineers who want production-grade LLM tracing without a mandatory managed SaaS and are comfortable owning their own infrastructure.
You need a no-ops setup and can't justify even light DevOps overhead to keep self-hosted infra running.
Open-source OTel tracing for LLM apps — real product, honest positioning, a few soft spots
“Phoenix does what it says: OpenTelemetry-based LLM tracing, evaluation, and prompt management, self-hosted or SaaS. The open-source angle is genuine, not a bait-and-switch.”
Three green signals. One: OpenInference and OpenTelemetry as the instrumentation standard — that's actual portability, not vendor lock-in dressed up as openness. Two: 15+ named integrations including LangChain, LlamaIndex, DSPy, CrewAI — breadth that matches where real builders actually work. Three: $0 self-hosted tier is real, not crippled.
Two yellow flags. The AX Pro SaaS tier at $50/month caps you at 50k spans and 30 days retention — that evaporates fast on any meaningful production workload. LangSmith and Langfuse both compete here and are well-funded. No changelog visible in scraped evidence, which makes shipping cadence harder to verify.
Exit story is actually good. OTel underneath means traces are standard-format. If Phoenix disappears, the instrumentation layer mostly survives. Solid bet for self-hosting teams. SaaS retention limits are the one real tradeoff to watch.
Broader integration list than Helicone, but LangSmith has LangChain distribution advantage and Langfuse has aggressive open-source momentum; Phoenix's OTel fidelity is the real differentiator.
OpenTelemetry and OpenInference are open standards — instrumentation code doesn't become a Phoenix-only artifact if you migrate.
Backed by Arize AI (an established MLOps vendor), but no changelog visible in evidence and SaaS span limits suggest the managed tier is still maturing.
'Open-source AI observability' matches the actual product — no inflated claims, no 'best way to' superlatives, self-hosting is real and documented.
OTel-native observability tooling has survivors (Honeycomb, Datadog) and casualties; Phoenix's open-source-first model mirrors what Langfuse used to reach traction, which is a reasonable pattern.
ML or AI engineers who want self-hosted LLM tracing without reinventing the instrumentation layer.
Your production span volume exceeds 50k/month and you don't want to self-host infrastructure.
Common questions answered by our AI research team
Phoenix is open-source, as stated on the homepage.
Yes, self-hosting is available — the homepage shows a "Self host" option alongside "Get started".
Company
Arize AIFounded
2020Pricing
From $50/moFree Plan
Available




Arize AI is a Berkeley, California-based ML observability company providing the Arize platform for production model monitoring and the open-source Phoenix toolkit for LLM tracing and evaluation.