4 min read

Token-Based Billing is a SaaS COGS Problem

Aaron Leggett : May 27, 2026 1:55:03 PM

Governance AI tech stack

Token-Based Billing is a SaaS COGS Problem

Part of our ongoing series on how we're using AI at Intuto — practically, honestly, and without the hype.

Every month, two bills arrive. One from Anthropic, one from Microsoft for Copilot. We pay them and move on.

We can see who's using what. But we couldn't tell you with any confidence what's driving those costs at a feature level. Which parts of the product. Which workflows. Where the spend is actually going. The money goes out, the tools keep working, and nobody's asked hard questions yet.

That's fine — until it isn't. And we're starting to think we should get ahead of it before the bill forces the conversation.

There's a lot of conversation about AI spend in the context of productivity — how many staff have Copilot licences, whether the tool is pulling its weight per seat. That's a real question, but it's a different one. For a SaaS team building AI directly into its product, untracked token spend isn't a productivity cost. It's cost of goods. It sits in your margin. And if you can't see it at a feature level, you can't price confidently, you can't forecast accurately, and you definitely can't explain it to anyone who asks.

What You're Paying Today Isn't the Real Price

The frontier model providers are not profitable at current prices. OpenAI lost $5 billion in 2024 on $3.7 billion in revenue. By Q1 2026, they were generating $5.7 billion — but losing $6.9 billion in the same period. Revenue is growing fast. So are the losses [Sherwood News, May 2026]. Anthropic and Google are running similar patterns. The pricing we've all normalised around is subsidised to capture market share while the land grab is still open.

That's not a criticism. It's a rational strategy. But it means the cost assumptions baked into your architecture today are built on a floor that will eventually move.

Prices have fallen dramatically in 2026 — GPU supply, architectural efficiencies, competitive pressure. That trend is real. But the losses aren't coming from running inference — they're coming from the cost of training the next generation of models. That investment has to be recouped somewhere, eventually. The providers betting on volume to get there may get it right. Or pricing normalises upward. Either way, building cost assumptions without any visibility into what you're spending is the risky position to be in.

Building Visibility Into AI Spend

Most teams are still treating AI spend as a line item to approve, not a system to instrument. That's understandable — token billing is new, the mental model hasn't formed yet, and nobody has a strong intuition for what things should cost.

We're no different. Our infrastructure billing is straightforward — we reserve instances in Azure and that's about it. We've never needed to tag spend back to individual features, and until now that's been perfectly fine.

AI changes that. When a feature starts making API calls that cost real money per request, knowing which feature is spending what stops being a nice-to-have. It's probably going to be the first time we care about feature-level infrastructure costs at all — and we suspect we're not alone in that.

The tooling exists to do it — Azure AI Foundry, API Management, Application Insights — but stitching it together into something meaningful takes deliberate effort. It's on our list. The teams that build that visibility early will be ahead of the ones who do it in response to a bill they weren't expecting.

Where Token Billing Might Go — Our Best Guess

This is speculation. We could be completely wrong. But the tension between unpredictable token billing and the need for budget certainty feels like it has to resolve somewhere — and our hunch is it resolves toward something like reserved compute pricing.

The logic: it mirrors how we already think about infrastructure. Reserve capacity at a baseline, pay less than on-demand rates. Providers get predictability on their side; engineering teams get a number they can actually put in a budget. Finance has been approving infrastructure commitments for years — a reserved AI compute line isn't a conceptual leap.

On-demand token billing makes sense for experimentation. For production workloads you understand well, committing to a baseline seems like the natural next step. Whether the providers move that direction, and when, is genuinely unknown. But it's the shape we'd expect the market to find.

Why AWS and Microsoft Are Quietly Winning Without Winning

The more interesting question isn't which AI model wins — it's who owns the layer that sits above the models. And that's a race AWS and Microsoft may have already won.

They have billing infrastructure. They have cost allocation tooling. They have IAM, dashboards, tagging, forecasting, and the trust relationship with enterprise procurement that took fifteen years to build. Model orchestration is already built into what Bedrock and Azure AI Foundry do today.

They don't need to win the model race. They just need to be the control plane that makes the model race irrelevant to the buyer. Once enterprise teams treat frontier models as interchangeable infrastructure — like compute or storage — differentiation shifts entirely to whoever makes governance and cost management easiest. That's a game they've been playing for a long time — and they're very good at it.

For teams like ours, the practical implication is real: the interface you use to govern AI spend may end up mattering more than which model sits underneath it.

What We're Actually Doing

We're not claiming to have this solved. We're paying more attention to our AI spend than we were six months ago — and we know we need to build the visibility before the bill makes it urgent.

Part of that is just knowing where the money is going. But the more interesting question is whether the spend is going toward the right things. Token billing creates a subtle pressure to optimise for cost rather than outcome — to reach for a cheaper model, or a shorter prompt, when the right answer was actually the more expensive one. That's a bad trade-off to make unconsciously.

The discipline we're working toward: being deliberate about model selection rather than defaulting to whatever's already wired in. Not every feature needs the most capable model. Some do. Knowing the difference — and building that into how we architect features rather than leaving it to habit — is where the real leverage is.

AI billing won't stay invisible forever. The teams treating it as an afterthought today are the ones who'll be scrambling to explain it tomorrow.

If your team is further along on this than we are, we'd genuinely love to hear how you're approaching it. Tweet us on X with #intutobuild and tag us @intutohq.

Authored by Aaron Leggett, Principal Product Architect at Intuto. Photo by Tam Ming.

The Human Element - Recalibrating the AI Speed Narrative

Aaron Leggett : May 19, 2026 10:14:17 AM

This is Part 3 of a 3-part series. If you missed the part 1 or 2, start with How We Encoded 10 Years of Tribal Knowledge Into AI Instructions, then...

Compliance Governance AI tech stack

How Are You Stopping Sensitive Documents Getting Pasted Into AI?

Aaron Leggett : Jun 25, 2026 8:19:58 AM

A radio presenter said this morning he'd asked AI a throwaway question about the NRL standings, and it told him, unprompted, "your beloved team the...

Governance A.I. tech stack

CI/CD Migration in a Day — And Why That's Not Really the Point

Aaron Leggett : Jun 3, 2026 9:22:48 AM

We had a critical gap in our delivery pipeline. Our repository had migrated to GitHub, but our build and deployment pipelines hadn't followed — we...

tech stack Intuto Platform