The Human Element - Recalibrating the AI Speed Narrative
This is Part 3 of a 3-part series. If you missed the part 1 or 2, start with How We Encoded 10 Years of Tribal Knowledge Into AI Instructions, then...
Part of our ongoing series on how we're using AI at Intuto — practically, honestly, and without the hype.
Every month, two bills arrive. One from Anthropic, one from Microsoft for Copilot. We pay them and move on.
We can see who's using what. But we couldn't tell you with any confidence what's driving those costs at a feature level. Which parts of the product. Which workflows. Where the spend is actually going. The money goes out, the tools keep working, and nobody's asked hard questions yet.
That's fine — until it isn't. And we're starting to think we should get ahead of it before the bill forces the conversation.
There's a lot of conversation about AI spend in the context of productivity — how many staff have Copilot licences, whether the tool is pulling its weight per seat. That's a real question, but it's a different one. For a SaaS team building AI directly into its product, untracked token spend isn't a productivity cost. It's cost of goods. It sits in your margin. And if you can't see it at a feature level, you can't price confidently, you can't forecast accurately, and you definitely can't explain it to anyone who asks.
The frontier model providers are not profitable at current prices. OpenAI lost $5 billion in 2024 on $3.7 billion in revenue. By Q1 2026, they were generating $5.7 billion — but losing $6.9 billion in the same period. Revenue is growing fast. So are the losses [Sherwood News, May 2026]. Anthropic and Google are running similar patterns. The pricing we've all normalised around is subsidised to capture market share while the land grab is still open.
That's not a criticism. It's a rational strategy. But it means the cost assumptions baked into your architecture today are built on a floor that will eventually move.
Prices have fallen dramatically in 2026 — GPU supply, architectural efficiencies, competitive pressure. That trend is real. But the losses aren't coming from running inference — they're coming from the cost of training the next generation of models. That investment has to be recouped somewhere, eventually. The providers betting on volume to get there may get it right. Or pricing normalises upward. Either way, building cost assumptions without any visibility into what you're spending is the risky position to be in.
Most teams are still treating AI spend as a line item to approve, not a system to instrument. That's understandable — token billing is new, the mental model hasn't formed yet, and nobody has a strong intuition for what things should cost.
We're no different. Our infrastructure billing is straightforward — we reserve instances in Azure and that's about it. We've never needed to tag spend back to individual features, and until now that's been perfectly fine.
AI changes that. When a feature starts making API calls that cost real money per request, knowing which feature is spending what stops being a nice-to-have. It's probably going to be the first time we care about feature-level infrastructure costs at all — and we suspect we're not alone in that.
The tooling exists to do it — Azure AI Foundry, API Management, Application Insights — but stitching it together into something meaningful takes deliberate effort. It's on our list. The teams that build that visibility early will be ahead of the ones who do it in response to a bill they weren't expecting.
This is speculation. We could be completely wrong. But the tension between unpredictable token billing and the need for budget certainty feels like it has to resolve somewhere — and our hunch is it resolves toward something like reserved compute pricing.
The logic: it mirrors how we already think about infrastructure. Reserve capacity at a baseline, pay less than on-demand rates. Providers get predictability on their side; engineering teams get a number they can actually put in a budget. Finance has been approving infrastructure commitments for years — a reserved AI compute line isn't a conceptual leap.
On-demand token billing makes sense for experimentation. For production workloads you understand well, committing to a baseline seems like the natural next step. Whether the providers move that direction, and when, is genuinely unknown. But it's the shape we'd expect the market to find.
The more interesting question isn't which AI model wins — it's who owns the layer that sits above the models. And that's a race AWS and Microsoft may have already won.
They have billing infrastructure. They have cost allocation tooling. They have IAM, dashboards, tagging, forecasting, and the trust relationship with enterprise procurement that took fifteen years to build. Model orchestration is already built into what Bedrock and Azure AI Foundry do today.
They don't need to win the model race. They just need to be the control plane that makes the model race irrelevant to the buyer. Once enterprise teams treat frontier models as interchangeable infrastructure — like compute or storage — differentiation shifts entirely to whoever makes governance and cost management easiest. That's a game they've been playing for a long time — and they're very good at it.
For teams like ours, the practical implication is real: the interface you use to govern AI spend may end up mattering more than which model sits underneath it.
We're not claiming to have this solved. We're paying more attention to our AI spend than we were six months ago — and we know we need to build the visibility before the bill makes it urgent.
Part of that is just knowing where the money is going. But the more interesting question is whether the spend is going toward the right things. Token billing creates a subtle pressure to optimise for cost rather than outcome — to reach for a cheaper model, or a shorter prompt, when the right answer was actually the more expensive one. That's a bad trade-off to make unconsciously.
The discipline we're working toward: being deliberate about model selection rather than defaulting to whatever's already wired in. Not every feature needs the most capable model. Some do. Knowing the difference — and building that into how we architect features rather than leaving it to habit — is where the real leverage is.
AI billing won't stay invisible forever. The teams treating it as an afterthought today are the ones who'll be scrambling to explain it tomorrow.
If your team is further along on this than we are, we'd genuinely love to hear how you're approaching it. Tweet us on X with #intutobuild and tag us @intutohq.
Authored by Aaron Leggett, Principal Product Architect at Intuto. Photo by Tam Ming.
This is Part 3 of a 3-part series. If you missed the part 1 or 2, start with How We Encoded 10 Years of Tribal Knowledge Into AI Instructions, then...
This is Part 2 of a 3-part series. If you missed the part 1, start with How We Encoded 10 Years of Tribal Knowledge Into AI Instructions.
Remember the early days of your dev career? The adrenaline of pushing code, the camaraderie of a small team... and maybe a silent prayer that your...