Langfuse logo

Langfuse Review

Visit

Open-source LLM engineering platform for debugging, evaluating, and improving AI applications

Langfuse is an open-source LLM engineering platform for teams building and iterating on large language model applications.

AI Panel Score

8.0/10

6 AI reviews

Reviewed

AI Editor Approved

About Langfuse

In practice, developers instrument their LLM application code using Langfuse SDKs (Python or JavaScript) or via OpenTelemetry, which captures traces of every model call, chain step, and agent action. These traces appear in the Langfuse UI where teams can inspect inputs, outputs, latency, token counts, and costs. Sessions, users, and releases can be tagged to organize traces across environments and deployment versions.

Beyond tracing, Langfuse includes a prompt management system with version control, A/B testing, variable support, caching, and a playground for iterating on prompts without redeploying code. Evaluation tooling covers scoring via SDK or UI, LLM-as-a-judge pipelines, annotation queues for human review, and dataset-based experiments to regression-test prompts or model changes. Custom dashboards and a Metrics API allow teams to track quality and cost trends over time.

Langfuse targets ML engineers and product teams shipping LLM-powered features who need structured tooling beyond basic logging. The platform is open-source (Apache 2.0 / MIT) and can be self-hosted on AWS, GCP, Azure, or via Docker Compose; a managed cloud version is also available with a free hobby tier. Competing tools in the LLM observability and evaluation space include LangSmith (by LangChain), Helicone, and Arize Phoenix.

The platform integrates with a broad range of frameworks and providers including LangChain, LlamaIndex, OpenAI, Anthropic, Vercel AI SDK, CrewAI, AutoGen, DSPy, and LiteLLM, among many others. Self-hosted deployments support configuration of ClickHouse, PostgreSQL, blob storage, and Redis caching, with Kubernetes Helm charts available for production-scale deployments.

Features

AI

  • LLM-as-a-Judge Evaluation

    Structured evaluation workflow that uses an LLM to automatically score and assess LLM application outputs.

Analytics

  • Custom Dashboards & Metrics API

    Enables creation of custom dashboards and programmatic access to metrics for monitoring LLM application performance.

  • Sessions & User Tracking

    Tracks user sessions, user feedback, and per-user activity across LLM application interactions.

  • Token & Cost Tracking

    Automatically tracks token consumption and associated costs across LLM calls captured in traces.

Collaboration

  • Annotation Queues

    Provides structured human annotation queues for reviewing and labeling LLM traces and outputs.

Core

  • Datasets & Experiments

    Supports creation of datasets and running experiments via SDK or UI to benchmark and compare LLM application versions.

  • LLM Tracing & Observability

    Captures traces, token usage, costs, and user feedback to provide observability into LLM application behavior.

  • Prompt Management

    Manages prompts with version control, variables, folders, A/B testing, caching, composability, and a playground for iteration.

Integration

  • Broad Framework & Model Integrations

    Natively integrates with LangChain, LlamaIndex, OpenAI, Anthropic, CrewAI, DSPy, Vercel AI SDK, and dozens of other frameworks and model providers.

Security

  • RBAC & SSO

    Provides role-based access control, SCIM provisioning, and SSO authentication for enterprise team management.

  • Self-Hosting

    Supports fully self-hosted deployment on AWS, Azure, GCP, Docker Compose, and Kubernetes Helm with configuration options for encryption, backups, and RBAC.

Support

  • Docs MCP Server

    Exposes a Model Context Protocol server endpoint that provides AI coding agents direct access to Langfuse documentation, GitHub issues, and discussions.

Preview

Langfuse desktop previewLangfuse mobile preview

Pricing Plans

Hobby

Free

Get started, no credit card required. Great for hobby projects and POCs.

  • 50k units / month included
  • 30 days data access
  • 2 users
  • 1 annotation queue
  • Community support via GitHub
  • All platform features (with limits)

Core

$29/monthly

For production projects. Longer data access and unlimited users.

  • 100k units / month included, additional: $8/100k units
  • 90 days data access
  • Unlimited users
  • 3 annotation queues
  • In-app support (48h response SLO)
  • Discounts: Startups, EDU, OSS
Popular

Pro

$199/monthly

For scaling projects. Unlimited history, high rate limits, all features.

  • 100k units / month included, additional: $8/100k units
  • 3 years data access
  • High rate limits (20,000 requests/min ingestion)
  • Data retention management
  • SOC2 & ISO27001 reports, BAA available (HIPAA)
  • Optional Teams Add-on at $300/mo for Enterprise SSO, RBAC, dedicated Slack support

Enterprise

$2,499/monthly

For large scale teams. Enterprise-grade support and security.

  • 100k units / month included, additional: $8/100k units (custom with yearly commitment)
  • 3 years data access
  • Custom rate limits and ingestion throughput
  • Audit Logs & SCIM API
  • Uptime SLA & Support SLA with dedicated support engineer
  • Custom volume pricing, architecture reviews, billing via AWS Marketplace or invoice

AI Panel Reviews

The Decision Maker

The Decision Maker

Strategic bet, vendor viability, timing, adoption approval
8.3/10

ClickHouse bought Langfuse in January for the AI feedback loop — vendor viability just became a non-question.

Langfuse landed inside ClickHouse's $400M Series D in January 2026, ending the standalone-vendor risk for an MIT-licensed LLM observability stack already running on ClickHouse under the hood. The buying question is no longer whether they'll exist in three years — it's whether the roadmap stays open-source the way the announcement promises.

ClickHouse acquired Langfuse on January 16, 2026, alongside its $400M Series D led by Dragoneer. Langfuse already ran on ClickHouse under the hood — this was the natural buyer. The standalone YC W23 viability question is off the table.

What's on offer at $29/month Core is real: Traces, prompt management with version control, LLM-as-a-Judge evaluators on production data, and annotation queues. 24,000+ GitHub stars and MIT licensing on the core mean LangSmith and Helicone are now competing against an MIT stack with ClickHouse's balance sheet behind it.

But acquisitions reshape roadmaps. The tradeoff is that ClickHouse's incentive is data platform growth, not LLM tooling independence — the open-source promise is one product VP rotation away from reinterpretation. Standardize Langfuse Cloud where you already use ClickHouse; keep evaluation tooling abstracted.

Competitive Positioning8.3

MIT-licensed core with self-hosting and 24K+ GitHub stars puts Langfuse ahead of LangSmith on openness and Helicone on feature breadth.

Reputation Risk8.2

Trusted by 19 of Fortune 50 and 63 of Fortune 500 plus the ClickHouse halo make this a defensible board-level pick.

Speed to Value8.0

Python and JavaScript SDKs plus OpenTelemetry support and a free Hobby tier let teams instrument code in days, not quarters.

Strategic Fit8.0

LLM observability and prompt management are now core needs for teams shipping AI features, not a nice-to-have.

Vendor Viability8.7

ClickHouse acquisition in January 2026 absorbs Langfuse into a Dragoneer-funded data platform, removing the 3-year survival question.

Pros

  • ClickHouse acquisition in January 2026 closes the standalone-vendor risk question.
  • MIT-licensed core supports full self-hosting on Docker, Kubernetes, AWS, GCP, and Azure.
  • Free Hobby tier includes 50,000 observations per month with no credit card required.
  • Native integrations span LangChain, LlamaIndex, OpenAI, Anthropic, and Vercel AI SDK.
  • SOC 2, ISO 27001, and HIPAA availability ship on the Pro tier at $199 per month.

Cons

  • Post-acquisition roadmap independence is not guaranteed despite the open-source pledge.
  • Periphery features like the Playground are commercially licensed even when self-hosted.
  • Enterprise tier jumps to $2,499 per month, a steep gap from the $199 Pro tier.

Right for

Engineering teams who ship LLM applications to production.

Avoid if

Solo builders who only need basic call logging.

The Domain Strategist

The Domain Strategist

Craft and strategy in the product's domain — adapts identity per category, same lens
8.2/10

ClickHouse's January acquisition turned Langfuse from open-source bet into the default ML observability stack.

The ClickHouse acquisition married the trace UI to the storage layer Langfuse already ran on, and that vertical integration changes the 3-year bet. LangSmith stays bundled with LangChain shops, but Langfuse just became the framework-agnostic incumbent.

ClickHouse closing the Langfuse acquisition on January 16, 2026 changes how a Head of ML Engineering sizes this. Langfuse already ran its trace store on ClickHouse — the deal collapses two vendors into one and removes the long-term query-engine risk. Apache 2.0 licensing and OpenTelemetry support stay intact.

Prompt Management with versioned A/B testing and the LLM-as-a-Judge evaluation pipeline are the strategic primitives — not the tracing UI. The Core tier at $29/month with 100k observations fits production teams; Enterprise at $2,499 ships SCIM and audit logs. Khan Academy's 100+ users across 7 teams is the proof point ML leads want.

But the tradeoff is vendor consolidation risk. LangSmith stays the default inside LangChain shops, Arize Phoenix owns the ML-ops crossover, and Datadog's bundled LLM observability will pressure the cloud tier. Self-hosted Langfuse on Kubernetes Helm is the hedge — own the binaries, keep the optionality.

Category Positioning8.0

Post-acquisition Langfuse is the framework-agnostic incumbent against LangSmith's LangChain-native lane and Arize's ML-ops crossover.

Domain Fit8.3

SDK-first instrumentation plus OpenTelemetry support matches how ML engineering teams actually wire observability.

Integration Surface8.3

Native coverage of LangChain, LlamaIndex, OpenAI, Anthropic, CrewAI, DSPy, and Vercel AI SDK is genuinely broad.

Long-term Implications7.8

ClickHouse acquisition de-risks storage but concentrates vendor exposure under one parent for trace store and UI.

Strategic Depth8.2

Prompt Management with A/B testing and LLM-as-a-Judge evaluation go beyond surface tracing into real evaluation craft.

Pros

  • ClickHouse acquisition de-risks the trace storage layer for 3-year platform bets.
  • Framework-agnostic SDKs natively cover OpenAI, Anthropic, LangChain, LlamaIndex, and DSPy.
  • Self-hosting via Kubernetes Helm preserves data sovereignty for regulated teams.
  • Prompt Management with versioned A/B testing replaces ad-hoc prompt spreadsheets.

Cons

  • LangSmith remains the path of least resistance inside LangChain-native shops.
  • Datadog's bundled LLM observability will pressure cloud-tier pricing over time.
  • ClickHouse ownership concentrates vendor risk for teams already standardized on Snowflake or BigQuery.

Right for

ML engineering teams who run multi-framework LLM stacks.

Avoid if

Teams who only use LangChain and want bundled tooling.

The Finance Lead

The Finance Lead

Money, total cost of ownership, contracts, procurement math
8.0/10

Cloud starts at $29/month, but MIT-licensed self-hosting is the procurement lever finance actually wants.

Langfuse Cloud lists Hobby free, Core at $29, Pro at $199, and Enterprise at $2,499, all metered at $8 per 100k observations after the included quota. The procurement story is the MIT-licensed self-host path — every product feature was open-sourced in June 2025, so finance can dodge the seat-and-meter cycle entirely.

Cloud lists at $0, $29, $199, $2,499, but the procurement angle is the MIT-licensed self-host path that costs zero in license fees. Engineering eats infra — ClickHouse, Postgres, Redis — and finance avoids the seat meter entirely.

On Cloud, Core at $29 includes 100k units monthly and Pro at $199 stretches retention to 3 years. Overage runs $8 per 100k observations across tracing, LLM-as-a-Judge, and Prompt Management. Compare to LangSmith Plus at $39/seat plus $2.50 per 1k traces — Langfuse's flat-tier units model is the easier line item to forecast.

Enterprise SSO and SCIM gate at $2,499, or Pro buyers pay a $300/month Teams add-on to unlock them. That's the classic SSO tax — but the open-source self-host bypass means finance has real leverage in the renewal conversation.

Billing & Procurement7.8

AWS Marketplace and invoice billing live at Enterprise; lower tiers run monthly credit-card without procurement friction.

Contract Flexibility7.5

Monthly billing is available at Core and Pro, but Enterprise SSO gates at $2,499/month or a $300 Teams add-on.

Pricing Transparency8.5

All four tiers, included units, and the $8 per 100k overage rate are published without a sales call.

ROI Clarity7.8

Token and cost tracking ship natively with the Metrics API and custom dashboards on every paid tier.

Total Cost of Ownership8.0

Units-based metering forecasts cleanly, and MIT-licensed self-hosting collapses license cost to zero.

Pros

  • MIT-licensed self-hosting bypasses the seat-and-meter pricing path entirely.
  • All four Cloud tiers and the $8 per 100k overage rate are published without a sales call.
  • Units-based metering is easier to forecast than LangSmith's per-seat plus per-trace stack.
  • Pro at $199/month ships SOC2 reports and a HIPAA BAA — rare outside Enterprise contracts.

Cons

  • Enterprise SSO and SCIM gate at $2,499/month or a $300 Teams add-on on Pro.
  • Self-host savings flip into ClickHouse, Postgres, and Redis infra costs finance can't shop.
  • Hobby tier caps data retention at 30 days, forcing a Core upgrade for any compliance review.

Right for

Engineering teams who need LLM tracing without a per-seat invoice.

Avoid if

Buyers who want managed SOC2 without paying the Pro tier.

The Domain Practitioner

The Domain Practitioner

Daily hands-on reality in the product's domain — adapts identity per category, same lens
8.0/10

@observe traces and the Prompts Management cache make Langfuse the open-source pick if you'll run the stack.

Langfuse's @observe decorator, Sessions, and Datasets give Python and JS teams real LLM observability without LangChain coupling. The tradeoff is self-hosted infrastructure — PostgreSQL, ClickHouse, Redis, and blob storage — or paying for Cloud.

Self-hosting Langfuse means standing up PostgreSQL, ClickHouse, Redis, and blob storage — that's the day-three tax for the Apache 2.0 license. The Helm chart helps, but anyone wanting LangSmith's zero-config flow has to accept the infrastructure load up front.

The @observe decorator on a Python function captures inputs, outputs, latency, and token cost without manual span wiring. Nested calls inherit trace context via contextvars, so a chain across LangChain plus a raw OpenAI call renders as one timeline. Sessions group multi-turn agent runs cleanly.

Evals run as LLM-as-judge, human annotation, or model-based against Datasets for regression testing. The catch: the Core tier at $29 ships only three annotation queues — heavy labelers hit that ceiling fast. Arize Phoenix wins on OpenTelemetry purity; Helicone is simpler if you only proxy OpenAI.

Day-3 Reality7.8

The @observe decorator and Sessions work cleanly day-to-day, but self-hosters carry a four-service infrastructure load.

Documentation Practitioner-Fit8.0

Decorator docs are concrete and a Docs MCP server exposes the full reference to coding agents.

Friction Surface7.5

Core tier caps annotation queues at three and self-host setup needs ClickHouse plus Redis tuning.

Power-User Depth8.2

Datasets, Evals, Prompts Management, and the Metrics API interlink for regression testing across model and prompt versions.

Workflow Integration8.2

Native hooks for LangChain, LlamaIndex, CrewAI, DSPy, Vercel AI SDK, plus OpenTelemetry ingestion cover most real stacks.

Pros

  • The @observe decorator captures inputs, outputs, latency, and token cost with one line of instrumentation per function.
  • Apache 2.0 self-hosting plus a free Hobby tier with 50k observations a month covers POCs without a credit card.
  • Datasets and Evals interlink for LLM-as-judge, human, or model-based regression testing across prompt and model versions.
  • Broad framework coverage: LangChain, LlamaIndex, CrewAI, DSPy, Vercel AI SDK, OpenAI, Anthropic, LiteLLM.

Cons

  • Self-hosting needs PostgreSQL, ClickHouse, Redis, and blob storage — not a Docker-and-go setup for production.
  • Core tier at $29 caps annotation queues at three; labeling-heavy teams will outgrow it.
  • LangSmith ships tighter native LangChain hooks if your stack is LangChain-only.

Right for

ML engineers shipping LLM features who want open-source observability.

Avoid if

Solo developers who only proxy OpenAI calls.

The Power User

The Power User

Daily human experience, onboarding, polish, learning curve, reliability
7.9/10

Langfuse hands power users the ClickHouse keys most LLM observability tools won't even discuss.

Tracing, prompt management, datasets, and LLM-as-a-judge evals sit in one open-source platform you can actually self-host. The catch is the v3 stack — ClickHouse, Postgres, Redis, S3, worker — is a real footprint to operate.

The thing that separates Langfuse from LangSmith isn't the trace viewer — both look fine. It's that the source is Apache 2.0 and the docs walk you through running it yourself, Helm chart on your cluster. Hobby tier is free if you want the cloud first, no credit card.

Three things power users will actually feel: Annotation Queues for human review, Datasets for regression-testing prompt changes, and the OpenTelemetry endpoint so you're not locked into one SDK. Python and JavaScript SDKs both ship; the Python v3 SDK is now OTEL-native, which is the right call.

The catch is the v3 architecture. ClickHouse replaced Postgres for traces in 2024, which fixed the scaling pain but added Redis, S3, and a worker container to the stack. Cloud Core at $29/month sidesteps all of that. Helicone is lighter; this is heavier on purpose.

Daily Polish8.0

Trace viewer, session tagging, and prompt playground feel like one team shipped them.

Learning Curve7.8

Datasets, Annotation Queues, and the Metrics API reward teams who stay past month one.

Mobile Parity7.5

Dev infrastructure — mobile is not the use case, neutral by category.

Onboarding Experience7.7

Hobby tier is free with no credit card, but self-host setup is a real afternoon.

Reliability Feel8.0

ClickHouse migration and 1,000+ production self-hosters indicate the scale work is done.

Pros

  • Apache 2.0 source with Helm charts and real self-host docs — not a teaser repo.
  • OpenTelemetry endpoint plus OpenLLMetry compatibility means you avoid SDK lock-in.
  • Annotation Queues and Datasets turn evals into real regression workflows.
  • Free Hobby tier with no credit card lowers the trial barrier.

Cons

  • The v3 stack — ClickHouse, Postgres, Redis, S3, worker container — is real operations work.
  • SSO, SAML, and audit logs are gated to Enterprise at $2,499/month.
  • JavaScript SDK is still catching up to the Python v3 OTEL rewrite.

Right for

Teams building LLM features who want self-hostable observability.

Avoid if

Solo developers who want zero infrastructure overhead.

The Skeptic

The Skeptic

Contrarian. Watch-outs, deal-breakers, broken promises, category patterns
7.7/10

Acquired by ClickHouse January 16, 2026 — answers the runway question but Datadog's APM bundle is still the fight.

ClickHouse acquired Langfuse on January 16, 2026 — the team had Series A term sheets on the table, not a distressed sale. The acquisition answers ClickHouse-cost and runway concerns, but Datadog LLM Observability shipped GA inside existing APM contracts in June 2024.

Acquired by ClickHouse on January 16, 2026. Term sheets for a Series A were on the table — the team wasn't trying to sell. That's a different exit story than the usual open-source-burnout sale.

The product is real. LLM-as-a-Judge evaluators, Annotation Queues, and a Prompt Management playground sit on the same Apache 2.0 codebase that hit 20,470 GitHub stars before the acquisition. The Hobby tier still ships 50,000 observations free. But Datadog LLM Observability went GA in June 2024 bundled inside existing APM contracts — that's the structural fight.

Honest read: the acquisition answers the ClickHouse cost question (they are ClickHouse now) and the runway question. Doesn't answer whether the OSS roadmap stays intact post-integration. Arize Phoenix stays free. Could go either way past 2027.

Competitive Differentiation7.0

Integrated tracing-plus-evals-plus-prompts stack edges point tools, but the category is crowded with LangSmith, Helicone, and Arize Phoenix.

Exit Portability8.2

Apache 2.0 license, self-host via Docker Compose or Kubernetes Helm, and OpenTelemetry support keep migration paths clean.

Long-term Viability7.8

ClickHouse acquisition on January 16, 2026 de-risks runway at a $15B parent, though post-acquisition OSS roadmap drift remains the watch item.

Marketing Honesty8.0

Open-source Apache 2.0 codebase, public GitHub repo, and transparent $29/month Core tier match the landing-page pitch.

Track Record Match7.2

YC W23 cohort with only 2.5 years in market, but 20,470 GitHub stars and 19-of-Fortune-50 adoption is a real signal.

Pros

  • Apache 2.0 license with full self-hosting via Docker Compose or Kubernetes Helm.
  • ClickHouse acquisition on January 16, 2026 backs the team with a $15B parent.
  • 20,470 GitHub stars before the acquisition signals real developer adoption.
  • OpenTelemetry support keeps exit portable to Arize Phoenix or other tools.

Cons

  • Datadog LLM Observability ships bundled inside existing APM contracts since June 2024.
  • Post-acquisition OSS roadmap drift is the standard pattern with acquired open-source projects.
  • LLM observability category is crowded with LangSmith, Helicone, and Arize Phoenix.

Right for

ML engineers who need self-hostable LLM tracing and evaluation.

Avoid if

Teams already paying for Datadog APM in production.

Buyer Questions

Common questions answered by our AI research team

Pricing

How many free observations per month on the free tier?

The free tier includes 50,000 observations/month with no credit card required.

Setup

Does Langfuse support self-hosting on Kubernetes?

Yes, Kubernetes (Helm) is a supported self-hosting option.

Security

Is Langfuse HIPAA eligible?

Yes, Langfuse is HIPAA eligible.

Integration

Does Langfuse integrate with LangChain?

Yes, LangChain is a supported agent framework integration.

Features

Can I run LLM-as-a-judge evaluations on production data?

Yes, LLM-as-a-judge evaluators can be run on production data or during experiments.

Also in LLM Platforms