GitHub Copilot's Token Flip Exposes the Flat-Rate AI Coding Lie

Copilot's move to per-token premium-model pricing was not a tweak. It was the AI coding category admitting what every power user already knew: the $20 flat plan was subsidising the people who needed it most.

GitHub Copilot's move to charge for premium-model requests by token instead of folding them into the flat monthly fee was not a pricing tweak. It was the moment the AI coding category admitted what every engineer who actually used these tools at scale already knew. The "$20 a month, unlimited" pricing was a marketing-driven loss leader, subsidised by users who barely used the tool, and it stopped working the second the underlying models got expensive enough that even infrequent power users could blow through the unit economics in a week.

What Did Copilot Actually Change?

The Copilot Pro subscription still costs $10 a month. The Copilot Business plan still costs $19 per user per month. The plans now distinguish between standard requests, which remain effectively unlimited at the plan's quota, and premium-model requests, which draw down a token budget that you can top up but can no longer rely on as unbounded. Premium models are the reasoning-tier and frontier-tier models in the Copilot model picker, the ones an experienced engineer reaches for when they want a multi-file refactor or a hard architectural question. The plan you bought a year ago for unlimited use now meters the requests you most wanted to make.

The pattern is not specific to Copilot. Cursor introduced a tier distinction between fast and slow requests months earlier, with the "fast" requests at premium models rationed by the plan. Aider has been honest about this since launch because it is bring-your-own-key and the user feels the cost on every call. The hosted competitors are converging on the same shape because the underlying economics force it.

Why Does the Flat-Rate Plan Stop Working?

Three forces push the unit economics negative for vendors and they have all accelerated in the last twelve months.

Model cost per request rose. Frontier reasoning models are five to ten times more expensive per token than the completion models they replaced. The vendor is paying a multiple of last year's cost for the same active user, even at unchanged usage.
Power-user usage rose. Engineers who built the tool into their loop now make hundreds of calls per day, not dozens. The 80th-percentile user's cost to the vendor used to be three or four times the median. It is now ten or twenty times.
The cheap-tier model fell behind. The non-premium model is now visibly worse for hard tasks, so the user reaches for the premium model more often. The vendor's subsidy line is now the line the user keeps crossing.

The vendor has three options when these forces compound. Raise the flat price for everyone. Cap usage in some non-pricing way (rate limits, throttling, queue depth). Meter the expensive model directly. The third option is the most honest and the most operationally manageable, which is why every vendor in the category is converging on it.

How Do You Measure What You Are Actually Spending?

The platform dashboards will tell you the request count. They will not tell you the cost-per-engineer in a way that maps to your existing chargeback model. Build the measurement yourself for at least one billing cycle. Here is the shape of the query if you are running Copilot at the org level and have the usage export enabled:

# Pull the Copilot usage report for the org. The CSV has per-user
# completion counts and chat counts segmented by model class.
gh api /orgs/$ORG/copilot/usage --paginate > usage.json

# Extract premium-model request counts per user
jq -r '.[] | [.day, .user, .premium_requests] | @csv' usage.json > premium_per_user.csv

# Aggregate to monthly per-user and compare to plan quota.
awk -F, 'NR>1 {sum[$2]+=$3} END {for (u in sum) print u, sum[u]}' premium_per_user.csv \
  | sort -k2 -n -r \
  | head -20

Run this for a month. Bucket users into three groups: under quota, near quota (within 15 percent), and at or over quota. The near-quota and over-quota groups are the population you have to make a decision about, and the size of those groups will surprise most engineering managers. The internal expectation is usually "a handful of power users." The actual distribution is usually 25 to 40 percent of active engineers in the near-or-over group, because the tools are now good enough that being a power user is no longer a personality trait.

What Should the Engineering Manager Do Next?

The decision is one of three, depending on the over-quota group's size and what they are doing.

Top up the budget for over-quota users. Premium request packs cost roughly four cents per request at Copilot's current published rates. A power user who needs 500 extra requests a month adds $20 to that user's seat cost. For an engineer whose hourly cost to the company is between $80 and $200, the math is obvious. Do not negotiate this internally. Approve it.
Reassign the heaviest users to a different tool. If your over-quota population is small enough to identify by name, the engineers running multi-file refactors or large codebase analyses are usually better served by Cursor on the Business plan or by direct Anthropic Claude API access via Aider. The total cost is comparable, the latency is better, and the tool's shape fits the work.
Do nothing and let the users self-throttle. This is the default and it is the wrong answer. Self-throttling means engineers are choosing to ask the worse model harder questions, which produces worse answers, which they do not realise because they are not aware of which model they are talking to in any given turn. The cost of this is invisible on the spreadsheet and visible in the PR review comments.

What Should the Individual Engineer Do?

If your team has not made an explicit decision about premium-model budget, you have two options that produce predictable cost.

Buy your own top-up. The premium request packs are individually purchasable. If your work depends on the frontier model, paying $10 or $20 out of pocket per month is a reasonable bet against a productivity tax that is harder to measure. Submit it as an expense if your company will reimburse. If they will not, the math still works at most senior-engineer compensation levels.
Run a local model for the easy work. The 80 percent of completions that do not require frontier reasoning can be served by an open-weight model on your own machine. Ollama running a 14-billion parameter coding model handles autocomplete and small refactors competently, and frees the premium model budget for the questions that actually need it. The two-tier discipline is a habit, not a tool decision. Most engineers who try it for a week keep doing it.

What Do the Vendors Owe You as a Buyer?

The honest list is shorter than the marketing pages suggest, and longer than the current contracts deliver.

Per-user usage exports. Engineering managers need to see, by name, who is over and under quota. Copilot delivers this. Cursor delivers this. Some smaller vendors do not. The absence of this data is a reason to evaluate the vendor's seriousness about enterprise customers.
Predictable overage pricing. The price per additional premium request should be published, stable, and not require a sales call. A vendor whose overage pricing is "contact us" is keeping the option to raise it on you mid-contract.
Notification when usage crosses thresholds. The engineer hitting the quota wall should know about it before the wall, not after the next prompt fails silently. Most vendors are bad at this. The ones who are good at it have visibly more sophisticated billing infrastructure, which is a proxy for operational maturity.
Model identity in the response. The tool should tell the user which model produced a given completion. The "we route to the best available model for your prompt" abstraction is convenient until the routing degrades and the engineer cannot tell why their answers got worse. Demand model transparency on every response.

What Happens to the Tools That Cannot Adjust?

The tools that stay on flat-rate pricing without metering the expensive model will be subsidising power users so heavily that they will run out of capital, raise the flat rate sharply enough to lose the casual users they were optimising for, or quietly degrade the premium-model routing to keep the unit economics afloat. The third path is the most dangerous for buyers because it is the hardest to detect. The model number stays the same on the marketing page and the model under the hood gets quietly cheaper.

The category-level lesson from Copilot's flip is that flat-rate AI tool pricing is a transitional artefact. It worked while the underlying models were cheap relative to the user's willingness to pay. It does not work now, and the contracts buyers signed last year on the assumption that it would work are the contracts that are about to be renegotiated. The vendors will say the renegotiation is a feature improvement. It is a pricing restructure, and the restructure is downstream of physics, not business strategy.

What Is the One Concrete Action for This Quarter?

Measure your premium-model usage at the per-user level, compare it against your plan's quota, identify the over-quota group, and either fund their top-up or move them to a tool with a different shape. Do this before your renewal. Do not let the renewal conversation be the moment you discover that 30 percent of your engineers have been silently downgrading their model choice for three months because they ran out of budget and the dashboard did not surface it. The token flip is the floor, not the ceiling, and the buyers who measure their actual consumption are the buyers who will sign the cheaper contracts twelve months from now.

GitHub Copilot's Token Flip Exposes the Flat-Rate AI Coding Lie

What Did Copilot Actually Change?

Why Does the Flat-Rate Plan Stop Working?

How Do You Measure What You Are Actually Spending?

What Should the Engineering Manager Do Next?

What Should the Individual Engineer Do?

What Do the Vendors Owe You as a Buyer?

What Happens to the Tools That Cannot Adjust?

What Is the One Concrete Action for This Quarter?

Discussion

Author

Recent Posts

OpenAI's Model Deprecation Cadence Is Now a Business Continuity Risk

IBM vs. Microsoft vs. Google: Which Enterprise Multi-Agent Orchestration Platform Should You Trust With Your AI Governance Layer?

Restricted-Access AI Models Are a New Enterprise Pricing Tier — Not Just a Safety Posture

More from the Blog