2026-04-09

The hottest AI job that doesn't exist yet: the Token Manager

AI is creating a new job to be done. As inference becomes a real operating cost, someone needs to be responsible for managing it.

AI is creating a new job to be done: the token manager.

Why does this role need to exist?

Because inference is no longer a hidden technical detail. It is an operating cost. One that starts small in testing, then explodes in production. A workflow that costs $0.10 in development can easily cost $10 at scale once it is deployed across users, agents, retries, and background processes.

And even if models become more efficient, that does not mean total usage will fall. In many cases, the opposite will happen. As inference gets cheaper and easier, teams will use more of it across more workflows. That is Jevons paradox in practice: efficiency gains increase consumption rather than reduce it.

We are already seeing the symptoms. Power users complain about sessions timing out, runaway API bills, agents consuming tokens in loops, and spending that is hard to predict or control. The problem is not just that inference is expensive. The problem is that nobody is really responsible for managing it.

That is why the token manager needs to exist.


This is a new kind of operational role created by AI-native work. Historically, developers stopped when the job was done or when they themselves needed a break. Now work is delegated to models and agents that do not naturally optimize for efficiency. They optimize for completion, persistence, and sometimes perfection. They retry, loop, expand context windows, call tools, and silently burn through budget in the background.

Most teams still assume AI will simply be available whenever they need it, for whatever task they want to run. But that assumption breaks down in production. Once AI becomes embedded into real workflows, token usage starts behaving less like software and more like infrastructure consumption. It must be monitored, allocated, prioritized, and controlled.

Today, most token cost management is handled ad hoc. Founders and developers route tasks to cheaper models, batch requests, cache outputs, and trim prompts. Those are useful first steps, but they are not enough. They are point solutions, not management systems.

The real shift is this: compute costs need to be managed more like energy costs.

That means moving from occasional optimization to continuous operational control.


A real token manager would need to understand, in real time, how tokens are being consumed, where budgets are being allocated, which tasks truly require premium models, and which jobs can be downgraded, deferred, compressed, or stopped altogether. It would also need to understand business economics. A pre-revenue startup, an enterprise support team, and an AI coding agency should not all spend tokens the same way. The right level of spend depends on margin structure, workflow value, customer importance, and tolerance for waste.

In other words, the token manager is not just a cost-cutting tool. It is a control layer between AI usage and business value.

That is what makes this job both difficult and necessary.


Today, energy managers exist because electricity is expensive, variable, operationally critical, and easy to waste without oversight. They monitor usage, reduce waste, shift demand, enforce budgets, and align energy consumption with business goals. AI is heading in the same direction. As inference becomes a larger share of operating cost, companies will need the equivalent function for tokens.

The teams that figure this out first will not just spend less. They will build more sustainable AI systems, ship with better margins, and avoid the trap of scaling usage faster than value.

The token manager is coming because AI has made compute abundant enough to use everywhere, but expensive enough that someone now has to manage it.

← Back to Blog