Who Defines 'Resolved'? The Hidden Risk in Outcome-Based AI Pricing

Outcome-based AI pricing charges per resolved ticket, qualified lead, or deflected call. The vendor writes the definition of the outcome into their own dashboards. Buyers are paying for a metric whose denominator they do not control.

An enterprise customer support team I spoke with last quarter signed a contract for an AI agent priced per "resolved ticket." Three months in, the vendor's dashboard showed they had been billed for 41,000 resolutions. The team's CSAT scores had not moved. Their human agent headcount had not dropped. When they exported the ticket-level data and applied their own definition of resolved — closed without a follow-up reopen in fourteen days, with a customer satisfaction response of 4 or higher — they got 27,000. The vendor's definition counted closures the human team had reopened. It counted closures the customer had never satisfied. It counted closures that, by the buyer's internal standard, would not have qualified.

The contract did not require the vendor to use the buyer's definition. The contract referenced "resolution" without defining it, and pointed to the vendor's documentation for the operational specification. The vendor's documentation defined a resolution as a ticket marked closed in the vendor's own system. The buyer was paying for a metric whose denominator the vendor controlled.

What Does Outcome-Based AI Pricing Actually Mean Today?

Outcome-based pricing in AI software is a category that has formed in the last eighteen months around the marketing claim that you pay only for results. Vendors charge per resolved support ticket, per qualified lead, per deflected call, per appointment booked, per document processed to completion. The shift from per-seat licensing to per-outcome billing sounds like a buyer-friendly evolution because it ties cost to value. The problem is that "outcome" is a category of fact that requires a definition, and the definition is almost always written by the party that gets paid when the outcome triggers.

The legal shape this creates is unusual. In most enterprise software contracts, the unit of consumption is operationally neutral — a user-month, a gigabyte, an API call. Either party can audit it. Outcome-based pricing introduces a unit that the buyer cannot independently produce from raw inputs. The vendor's pipeline classifies an event as a resolution or not, and the buyer pays based on that classification. The audit surface is the vendor's own internal logic.

Who Currently Sells Outcome-Based AI Pricing?

The customer-support segment is where outcome-based pricing has moved fastest. Intercom's Fin agent prices per resolution. Salesforce's Agentforce prices per conversation. Multiple smaller vendors in the same segment have followed. The sales-development category has its own version, with vendors charging per qualified meeting booked. The collections category charges per dollar recovered. The contract-review category charges per document processed to a completion state.

Each of these categories has a specific class of definitional risk. Customer support's risk is the line between "resolved by the agent" and "resolved by a follow-up human." Sales development's risk is the line between "qualified" and "marked qualified to bill." Collections' risk is the difference between "recovered" and "recovered with a downstream chargeback that the AI is not responsible for." Contract review's risk is "completed processing" versus "completed processing accurately."

Why Does the Vendor's Definition Almost Always Drift?

Not because the vendor is malicious. The drift is structural. A vendor's product team is rewarded for showing the customer that the agent is producing outcomes. The classification engine that decides whether an event counts as an outcome is owned by the same team. Each quarter that the classification engine produces more outcomes, the vendor's revenue from that customer increases. There is no internal counter-incentive to err on the side of stricter classification, because stricter classification reduces the line item the vendor's account team is measured on.

Warning. When the party that benefits from a classification owns the classifier, the classifier drifts toward the benefit over time. This is not a forecast. It is a structural pattern documented in metering, content moderation, search relevance, and ad auction systems for two decades. The frequency of drift is faster in AI products because the underlying model can be retrained without notice, which retrains the classifier.

The buyer cannot detect the drift by sampling. A 5-percent slip in classification accuracy over a year is invisible at the ticket level. It is visible only at the aggregate, against a stable counterfactual that the buyer almost never has. Most buyers do not maintain a parallel definition of resolution that they apply independently. The ones who do are usually doing it because they were burned by an earlier vendor.

What Should the Contract Say That Most Contracts Do Not?

The contract has to define the outcome unit in operational, machine-readable terms that both parties can compute from the raw event log. "Resolution" cannot be defined as "a ticket marked resolved in the vendor's system." It has to be defined as a state of the underlying event stream that the buyer can also produce. For a support ticket, that might be: closed within the agent's session, not reopened within thirty days, and accompanied by a customer feedback score of four or higher on a five-point scale. Each clause is independently verifiable from the ticket log.

The contract also has to specify the right of the buyer to re-classify. The right cannot be a politeness; it has to be a contractual entitlement that, if the buyer's classification disagrees with the vendor's by more than a stated tolerance, the buyer's classification is the billing source for that period. Without this clause, the buyer can detect the drift and still has no remedy short of switching vendors, which is expensive and gives the vendor a strategic advantage in negotiation.

What Audit Rights Are Worth Asking For?

Raw event export. The buyer can extract every event the vendor classified as an outcome, in a machine-readable format, with the inputs the classifier used to produce that decision.
Classifier versioning. Every change to the production classifier that affects outcome counts is logged with a date and a brief description of the change. The buyer is notified within a defined window.
Counterfactual replay. The buyer can rerun a stored sample of past events against the current classifier and a previous classifier version, and observe the drift directly.
Right to dispute. The buyer can flag a defined percentage of monthly outcomes for review, and disputed outcomes follow a documented escalation process.
Capped uplift. Annual outcome counts cannot grow more than a stated percentage above the prior period without triggering a contract review. This protects against quiet inflation as the classifier evolves.

None of these clauses are standard in outcome-based AI contracts as currently drafted. Vendors will object to all of them, on the grounds that they make the operational model more expensive to support. That objection is true and the cost is small relative to the structural risk the absence of these clauses creates for the buyer.

How Should Procurement Price the Risk?

The buyer should not sign an outcome-based contract at the headline price the vendor quotes. The price has to be adjusted for the expected drift, which means the buyer is paying for a unit whose value over the contract period is likely to be smaller than the unit value at signing. A reasonable adjustment is to discount the outcome unit by 10 to 20 percent against the vendor's quoted price, on the theory that the classifier will count more events as outcomes over twenty-four months than it counts at month one, and the buyer wants the cost flat against the buyer's own definition.

If the vendor will not discount, the alternative is to negotiate a hybrid model: a fixed monthly platform fee plus an outcome unit at the vendor's quoted price but capped at a stated annual ceiling. The cap is the buyer's protection against classifier drift, because if the vendor's classifier suddenly produces 30 percent more outcomes year-over-year, the cap forces the conversation about what changed and gives the buyer leverage to revisit the definition.

What Pattern Does This Look Like in Five Years?

Outcome-based pricing is not going to retreat. The buyer demand for it is real, because per-seat pricing increasingly underestimates the work an AI agent does relative to a human seat. The structural issue is going to be resolved by one of three mechanisms. The first is independent third-party metering, where a neutral party measures the outcomes and bills the buyer. This is the closest analogue to how electricity metering works, and it is the cleanest answer. It is also slow to adopt because no third party currently has the operational footprint to measure across vendors.

The mechanism that probably wins. Buyer-defined outcome contracts where the vendor agrees to the buyer's operational definition and the buyer's auditor verifies. The buyer pays slightly more per outcome in exchange for definitional control. The risk-adjusted total cost is lower than vendor-defined outcomes even at the higher unit price.

The second mechanism is regulatory. Financial services, healthcare, and government procurement will write rules requiring vendor-side outcome classifications to be third-party audited at defined intervals. This will move slowly, but it will move, because the analogous rules for metered utilities and ad auctions have already been written and the templates exist.

The third mechanism is buyer-side counter-classifier. Sophisticated buyers will build their own classification engine and reconcile against the vendor's monthly. This requires data engineering capability that most buyers do not have, but it is the fastest path for buyers who can afford it and who care enough to invest.

What Should a Buyer Do Tomorrow?

Two specific actions, neither of them dramatic. The first is to read the contract that defines any outcome-based pricing in your current vendor portfolio and find the exact clause that defines the outcome unit. If the clause points to vendor documentation, write down what that documentation says today, and check it again at quarter-end. The drift is detectable if you measure it. It is not detectable if you assume the definition is stable.

The second is to require, in any new outcome-based contract you sign in the next twelve months, the raw-event-export and counterfactual-replay clauses described above. The vendor will resist. The resistance itself is informative. Vendors who are confident in their classifier's stability are willing to commit to its audit. Vendors who are not confident are exactly the vendors a careful buyer should be most cautious about, and the contract negotiation is where that signal surfaces.

Who Defines 'Resolved'? The Hidden Risk in Outcome-Based AI Pricing

What Does Outcome-Based AI Pricing Actually Mean Today?

Who Currently Sells Outcome-Based AI Pricing?

Why Does the Vendor's Definition Almost Always Drift?

What Should the Contract Say That Most Contracts Do Not?

What Audit Rights Are Worth Asking For?

How Should Procurement Price the Risk?

What Pattern Does This Look Like in Five Years?

What Should a Buyer Do Tomorrow?

Discussion

Author

Recent Posts

OpenAI's Model Deprecation Cadence Is Now a Business Continuity Risk

IBM vs. Microsoft vs. Google: Which Enterprise Multi-Agent Orchestration Platform Should You Trust With Your AI Governance Layer?

Restricted-Access AI Models Are a New Enterprise Pricing Tier — Not Just a Safety Posture

More from the Blog