Rethinking Tokens in the Enterprise

By Jason Symons, SVP, Head of Engineering

I caught up recently with an Engineering Leader at a major scaled enterprise. They mandated AI-assisted development across their team and burned through their annual token budget in roughly thirty days.

The conclusion the leadership team reached, and the conclusion most leadership teams are reaching when they see headlines like this, was to throttle it.

I think they’re solving for the symptom instead of the system.

The token panic making the rounds in enterprise AI right now is almost entirely a story about coding. That part is real, but it’s a narrow story, and it’s distorting how leaders are thinking about everything else AI is doing across the enterprise. This is especially true in healthcare, where agents, automations, decisioning, and customer-facing applications behave nothing like a dev team pointing a model at a codebase.

When boards generalize from the coding headlines to all AI spend, they over-correct in places where the economics simply aren’t comparable.

The real question isn’t whether your token bill is growing. It’s whether the work is changing fast enough to justify it.

Here are three things I believe leaders are missing.

It’s about the shape of the work, not the volume of AI

In software development, token consumption scales with the shape of the environment more than the amount of AI you’re using.

Standing up a small, well-bounded microservice is relatively cheap. The context window stays tight and every change remains localized. Working in a legacy monolith with millions of lines of code, tangled dependencies, and tribal knowledge living in the heads of a handful of senior engineers? You’ve introduced a token tax. The model has to ingest enormous amounts of context just to become useful.

This is not fundamentally a token problem. It’s an architectural one.

If your token bill exploded, the bill is telling you something about your codebase. The fix isn’t necessarily to throttle the model—or worse, throttle developer productivity. It’s to understand where you’re paying the tax and decide, surface by surface, whether that tax is worth it. Sometimes it is. Sometimes the right move is to reshape the work before pointing AI at it.

Either way, the diagnostic is the architecture, not the agent.

Cost is the wrong frame. Leverage is the right one

A token bill in isolation tells you almost nothing. The number that matters is what the work actually replaced, automated, accelerated, or unlocked.

“How do we control this OpEx?” is the instinctive question, but also the wrong one to ask. The better question is: “Did we restructure the work to capture the value we’re paying for?”

Most token overruns I’ve seen aren’t AI problems. They’re change management problems. If you deploy AI alongside the old org structure, the old processes, and the old operating model, of course the AI looks expensive. You’re paying for the new system on top of the cost of the old one. Nothing has actually been optimized.

Cloud FinOps teams will naturally treat tokens as another operating expense to compress. But platform leaders need a different lens. Tokens are leverage, not overhead. Compressing them before you’ve captured the value behind them is how enterprises end up paying for AI without ever benefiting from AI.

Agents shouldn’t do every task

A surprising amount of token burn is self-inflicted by architecture. The most expensive way to be wrong is to ask a frontier model to perform work that a deterministic function, an MCP call, a tool invocation, or a smaller model could handle just as well.

Every workflow requires balancing expenditure against the precision of the result. Frontier models excel when navigating ambiguity, but utilizing them for every task is unnecessary. If the work demands absolute reliability, or if a structured tool call can deliver the same outcome, using the largest model is simply paying a tax.

How we think about ROAI™

If token spend isn’t the thing to optimize, what should leaders optimize instead? At Optura, we’ve built our entire methodology around ROAI™ (Return on AI Investment) and we think about it through four levers:

Model selection by workload: This is a strategic matching exercise, not a simple question of size. Every task carries a distinct profile (balancing depth of reasoning, latency, and hallucination tolerance against the cost per outcome). Frequently, a targeted reasoning model outperforms a larger generalist, while a fine-tuned specialty model can outpace a frontier heavyweight. Alignment matters more than size.
Prompt optimization: Loose prompts produce loose outputs: more tokens generated, more retries, and more downstream cleanup. A real prompt discipline involves precise instructions, output schemas that constrain the response shape, few-shot examples calibrated to the task, and cache-aware structures that let stable context be reused. Tightening prompts is measurable, eval-driven engineering, not a vibe check.
Context shaping: Relying on raw dumps of a document store or knowledge graph into a context window is an expensive engineering shortcut. Effective systems balance retrieval, prompt caching, and summarization so the model only processes what is essential for the outcome.
Tool offload: Agents orchestrate. Tools execute. The model reasons and plans; deterministic tools handle retrieval, calculation, validation, and action via well-defined interfaces like MCP. Every step routed to a tool is generation you don’t pay for, and the result comes back verifiable in a way that pure generation never is.

One final lever that organizations often overlook, and the one Optura considers fundamental: pair every AI deployment with the workforce, workflow, and process redesign required to capture the value. Without that, you’re simply adding a new cost line to an old system and blaming the AI.

The question to ask

For leaders staring at a growing token bill, don’t optimize the token. Optimize the system around it.

The companies that get AI right over the next twenty-four months won’t be the ones with the smallest token bills. They’ll be the ones that paired their AI rollouts with the organizational and process changes needed to actually capture the value.

Because the real question was never whether the token bill went up. It’s whether the work changed enough to justify it.

Request a demo to see how Optura helps enterprise leaders measure ROAI™ and build the infrastructure for AI that actually pays off.

More Resources

Articles

Enterprise AI Doesn’t Have a Speed Problem. It Has a Measurement Problem.

A dark abstract background of teal nodes connected by glowing lines in a three-dimensional grid, representing a structured data network.

Articles

Intelligence Just Got Cheaper. Proving What You Do With It Just Got Harder.

A smiling female nurse in teal scrubs looking down and interacting with a digital tablet.

Articles

The Real Reason Your AI Investment Isn’t Paying Off

View all Resources View all Resources

Get the Real Read on Healthcare AI

Stay up to date with thoughtful takes, real outcomes, and the moves leaders are making.

Subscribe Subscribe