Open-source LLM engineering platform for debugging, evaluating, and improving AI applications
Langfuse is an open-source LLM engineering platform for teams building and iterating on large language model applications.
AI Panel Score
6 AI reviews
Reviewed
AI Editor ApprovedApproved and published by our AI Editor-in-Chief after full panel analysis.In practice, developers instrument their LLM application code using Langfuse SDKs (Python or JavaScript) or via OpenTelemetry, which captures traces of every model call, chain step, and agent action. These traces appear in the Langfuse UI where teams can inspect inputs, outputs, latency, token counts, and costs. Sessions, users, and releases can be tagged to organize traces across environments and deployment versions.
Beyond tracing, Langfuse includes a prompt management system with version control, A/B testing, variable support, caching, and a playground for iterating on prompts without redeploying code. Evaluation tooling covers scoring via SDK or UI, LLM-as-a-judge pipelines, annotation queues for human review, and dataset-based experiments to regression-test prompts or model changes. Custom dashboards and a Metrics API allow teams to track quality and cost trends over time.
Langfuse targets ML engineers and product teams shipping LLM-powered features who need structured tooling beyond basic logging. The platform is open-source (Apache 2.0 / MIT) and can be self-hosted on AWS, GCP, Azure, or via Docker Compose; a managed cloud version is also available with a free hobby tier. Competing tools in the LLM observability and evaluation space include LangSmith (by LangChain), Helicone, and Arize Phoenix.
The platform integrates with a broad range of frameworks and providers including LangChain, LlamaIndex, OpenAI, Anthropic, Vercel AI SDK, CrewAI, AutoGen, DSPy, and LiteLLM, among many others. Self-hosted deployments support configuration of ClickHouse, PostgreSQL, blob storage, and Redis caching, with Kubernetes Helm charts available for production-scale deployments.
Structured evaluation workflow that uses an LLM to automatically score and assess LLM application outputs.
Enables creation of custom dashboards and programmatic access to metrics for monitoring LLM application performance.
Tracks user sessions, user feedback, and per-user activity across LLM application interactions.
Automatically tracks token consumption and associated costs across LLM calls captured in traces.
Provides structured human annotation queues for reviewing and labeling LLM traces and outputs.
Supports creation of datasets and running experiments via SDK or UI to benchmark and compare LLM application versions.
Captures traces, token usage, costs, and user feedback to provide observability into LLM application behavior.
Manages prompts with version control, variables, folders, A/B testing, caching, composability, and a playground for iteration.
Natively integrates with LangChain, LlamaIndex, OpenAI, Anthropic, CrewAI, DSPy, Vercel AI SDK, and dozens of other frameworks and model providers.
Provides role-based access control, SCIM provisioning, and SSO authentication for enterprise team management.
Supports fully self-hosted deployment on AWS, Azure, GCP, Docker Compose, and Kubernetes Helm with configuration options for encryption, backups, and RBAC.
Exposes a Model Context Protocol server endpoint that provides AI coding agents direct access to Langfuse documentation, GitHub issues, and discussions.
Get started, no credit card required. Great for hobby projects and POCs.
For production projects. Longer data access and unlimited users.
For scaling projects. Unlimited history, high rate limits, all features.
For large scale teams. Enterprise-grade support and security.
ClickHouse bought Langfuse in January for the AI feedback loop — vendor viability just became a non-question.
“Langfuse landed inside ClickHouse's $400M Series D in January 2026, ending the standalone-vendor risk for an MIT-licensed LLM observability stack already running on ClickHouse under the hood. The buying question is no longer whether they'll exist in three years — it's whether the roadmap stays open-source the way the announcement promises.”
ClickHouse acquired Langfuse on January 16, 2026, alongside its $400M Series D led by Dragoneer. Langfuse already ran on ClickHouse under the hood — this was the natural buyer. The standalone YC W23 viability question is off the table.
What's on offer at $29/month Core is real: Traces, prompt management with version control, LLM-as-a-Judge evaluators on production data, and annotation queues. 24,000+ GitHub stars and MIT licensing on the core mean LangSmith and Helicone are now competing against an MIT stack with ClickHouse's balance sheet behind it.
But acquisitions reshape roadmaps. The tradeoff is that ClickHouse's incentive is data platform growth, not LLM tooling independence — the open-source promise is one product VP rotation away from reinterpretation. Standardize Langfuse Cloud where you already use ClickHouse; keep evaluation tooling abstracted.
MIT-licensed core with self-hosting and 24K+ GitHub stars puts Langfuse ahead of LangSmith on openness and Helicone on feature breadth.
Trusted by 19 of Fortune 50 and 63 of Fortune 500 plus the ClickHouse halo make this a defensible board-level pick.
Python and JavaScript SDKs plus OpenTelemetry support and a free Hobby tier let teams instrument code in days, not quarters.
LLM observability and prompt management are now core needs for teams shipping AI features, not a nice-to-have.
ClickHouse acquisition in January 2026 absorbs Langfuse into a Dragoneer-funded data platform, removing the 3-year survival question.
Engineering teams who ship LLM applications to production.
Solo builders who only need basic call logging.
ClickHouse's January acquisition turned Langfuse from open-source bet into the default ML observability stack.
“The ClickHouse acquisition married the trace UI to the storage layer Langfuse already ran on, and that vertical integration changes the 3-year bet. LangSmith stays bundled with LangChain shops, but Langfuse just became the framework-agnostic incumbent.”
ClickHouse closing the Langfuse acquisition on January 16, 2026 changes how a Head of ML Engineering sizes this. Langfuse already ran its trace store on ClickHouse — the deal collapses two vendors into one and removes the long-term query-engine risk. Apache 2.0 licensing and OpenTelemetry support stay intact.
Prompt Management with versioned A/B testing and the LLM-as-a-Judge evaluation pipeline are the strategic primitives — not the tracing UI. The Core tier at $29/month with 100k observations fits production teams; Enterprise at $2,499 ships SCIM and audit logs. Khan Academy's 100+ users across 7 teams is the proof point ML leads want.
But the tradeoff is vendor consolidation risk. LangSmith stays the default inside LangChain shops, Arize Phoenix owns the ML-ops crossover, and Datadog's bundled LLM observability will pressure the cloud tier. Self-hosted Langfuse on Kubernetes Helm is the hedge — own the binaries, keep the optionality.
Post-acquisition Langfuse is the framework-agnostic incumbent against LangSmith's LangChain-native lane and Arize's ML-ops crossover.
SDK-first instrumentation plus OpenTelemetry support matches how ML engineering teams actually wire observability.
Native coverage of LangChain, LlamaIndex, OpenAI, Anthropic, CrewAI, DSPy, and Vercel AI SDK is genuinely broad.
ClickHouse acquisition de-risks storage but concentrates vendor exposure under one parent for trace store and UI.
Prompt Management with A/B testing and LLM-as-a-Judge evaluation go beyond surface tracing into real evaluation craft.
ML engineering teams who run multi-framework LLM stacks.
Teams who only use LangChain and want bundled tooling.
Cloud starts at $29/month, but MIT-licensed self-hosting is the procurement lever finance actually wants.
“Langfuse Cloud lists Hobby free, Core at $29, Pro at $199, and Enterprise at $2,499, all metered at $8 per 100k observations after the included quota. The procurement story is the MIT-licensed self-host path — every product feature was open-sourced in June 2025, so finance can dodge the seat-and-meter cycle entirely.”
Cloud lists at $0, $29, $199, $2,499, but the procurement angle is the MIT-licensed self-host path that costs zero in license fees. Engineering eats infra — ClickHouse, Postgres, Redis — and finance avoids the seat meter entirely.
On Cloud, Core at $29 includes 100k units monthly and Pro at $199 stretches retention to 3 years. Overage runs $8 per 100k observations across tracing, LLM-as-a-Judge, and Prompt Management. Compare to LangSmith Plus at $39/seat plus $2.50 per 1k traces — Langfuse's flat-tier units model is the easier line item to forecast.
Enterprise SSO and SCIM gate at $2,499, or Pro buyers pay a $300/month Teams add-on to unlock them. That's the classic SSO tax — but the open-source self-host bypass means finance has real leverage in the renewal conversation.
AWS Marketplace and invoice billing live at Enterprise; lower tiers run monthly credit-card without procurement friction.
Monthly billing is available at Core and Pro, but Enterprise SSO gates at $2,499/month or a $300 Teams add-on.
All four tiers, included units, and the $8 per 100k overage rate are published without a sales call.
Token and cost tracking ship natively with the Metrics API and custom dashboards on every paid tier.
Units-based metering forecasts cleanly, and MIT-licensed self-hosting collapses license cost to zero.
Engineering teams who need LLM tracing without a per-seat invoice.
Buyers who want managed SOC2 without paying the Pro tier.
@observe traces and the Prompts Management cache make Langfuse the open-source pick if you'll run the stack.
“Langfuse's @observe decorator, Sessions, and Datasets give Python and JS teams real LLM observability without LangChain coupling. The tradeoff is self-hosted infrastructure — PostgreSQL, ClickHouse, Redis, and blob storage — or paying for Cloud.”
Self-hosting Langfuse means standing up PostgreSQL, ClickHouse, Redis, and blob storage — that's the day-three tax for the Apache 2.0 license. The Helm chart helps, but anyone wanting LangSmith's zero-config flow has to accept the infrastructure load up front.
The @observe decorator on a Python function captures inputs, outputs, latency, and token cost without manual span wiring. Nested calls inherit trace context via contextvars, so a chain across LangChain plus a raw OpenAI call renders as one timeline. Sessions group multi-turn agent runs cleanly.
Evals run as LLM-as-judge, human annotation, or model-based against Datasets for regression testing. The catch: the Core tier at $29 ships only three annotation queues — heavy labelers hit that ceiling fast. Arize Phoenix wins on OpenTelemetry purity; Helicone is simpler if you only proxy OpenAI.
The @observe decorator and Sessions work cleanly day-to-day, but self-hosters carry a four-service infrastructure load.
Decorator docs are concrete and a Docs MCP server exposes the full reference to coding agents.
Core tier caps annotation queues at three and self-host setup needs ClickHouse plus Redis tuning.
Datasets, Evals, Prompts Management, and the Metrics API interlink for regression testing across model and prompt versions.
Native hooks for LangChain, LlamaIndex, CrewAI, DSPy, Vercel AI SDK, plus OpenTelemetry ingestion cover most real stacks.
ML engineers shipping LLM features who want open-source observability.
Solo developers who only proxy OpenAI calls.
Langfuse hands power users the ClickHouse keys most LLM observability tools won't even discuss.
“Tracing, prompt management, datasets, and LLM-as-a-judge evals sit in one open-source platform you can actually self-host. The catch is the v3 stack — ClickHouse, Postgres, Redis, S3, worker — is a real footprint to operate.”
The thing that separates Langfuse from LangSmith isn't the trace viewer — both look fine. It's that the source is Apache 2.0 and the docs walk you through running it yourself, Helm chart on your cluster. Hobby tier is free if you want the cloud first, no credit card.
Three things power users will actually feel: Annotation Queues for human review, Datasets for regression-testing prompt changes, and the OpenTelemetry endpoint so you're not locked into one SDK. Python and JavaScript SDKs both ship; the Python v3 SDK is now OTEL-native, which is the right call.
The catch is the v3 architecture. ClickHouse replaced Postgres for traces in 2024, which fixed the scaling pain but added Redis, S3, and a worker container to the stack. Cloud Core at $29/month sidesteps all of that. Helicone is lighter; this is heavier on purpose.
Trace viewer, session tagging, and prompt playground feel like one team shipped them.
Datasets, Annotation Queues, and the Metrics API reward teams who stay past month one.
Dev infrastructure — mobile is not the use case, neutral by category.
Hobby tier is free with no credit card, but self-host setup is a real afternoon.
ClickHouse migration and 1,000+ production self-hosters indicate the scale work is done.
Teams building LLM features who want self-hostable observability.
Solo developers who want zero infrastructure overhead.
Acquired by ClickHouse January 16, 2026 — answers the runway question but Datadog's APM bundle is still the fight.
“ClickHouse acquired Langfuse on January 16, 2026 — the team had Series A term sheets on the table, not a distressed sale. The acquisition answers ClickHouse-cost and runway concerns, but Datadog LLM Observability shipped GA inside existing APM contracts in June 2024.”
Acquired by ClickHouse on January 16, 2026. Term sheets for a Series A were on the table — the team wasn't trying to sell. That's a different exit story than the usual open-source-burnout sale.
The product is real. LLM-as-a-Judge evaluators, Annotation Queues, and a Prompt Management playground sit on the same Apache 2.0 codebase that hit 20,470 GitHub stars before the acquisition. The Hobby tier still ships 50,000 observations free. But Datadog LLM Observability went GA in June 2024 bundled inside existing APM contracts — that's the structural fight.
Honest read: the acquisition answers the ClickHouse cost question (they are ClickHouse now) and the runway question. Doesn't answer whether the OSS roadmap stays intact post-integration. Arize Phoenix stays free. Could go either way past 2027.
Integrated tracing-plus-evals-plus-prompts stack edges point tools, but the category is crowded with LangSmith, Helicone, and Arize Phoenix.
Apache 2.0 license, self-host via Docker Compose or Kubernetes Helm, and OpenTelemetry support keep migration paths clean.
ClickHouse acquisition on January 16, 2026 de-risks runway at a $15B parent, though post-acquisition OSS roadmap drift remains the watch item.
Open-source Apache 2.0 codebase, public GitHub repo, and transparent $29/month Core tier match the landing-page pitch.
YC W23 cohort with only 2.5 years in market, but 20,470 GitHub stars and 19-of-Fortune-50 adoption is a real signal.
ML engineers who need self-hostable LLM tracing and evaluation.
Teams already paying for Datadog APM in production.
Common questions answered by our AI research team
The free tier includes 50,000 observations/month with no credit card required.
Yes, Kubernetes (Helm) is a supported self-hosting option.
Yes, Langfuse is HIPAA eligible.
Yes, LangChain is a supported agent framework integration.
Yes, LLM-as-a-judge evaluators can be run on production data or during experiments.





Langfuse is an open-source LLM engineering platform based in Berlin offering tracing, prompt management, evaluations, and analytics for AI application development.