Once agents become part of daily engineering work, platform teams need to manage usageThe Next Agent Problemis the BillOnce agents become part of daily engineering work, platform teams need to manage usage like shared production capacityBefore we continue, a quick word from our sponsorSocial engineering is about manipulating people's emotions. Identify the susceptibilities that hackers use to exploit people.This NINJIO Insights Report dives into the key emotional susceptibilities that make social engineering work and offers concrete steps that your security team can take to equip your workforce to resist cyberattacks.SEE MOREHi, welcome back.Last time, we talked about least agency: giving agents thesmallest usefulamount of autonomy, instead of handing them every tool and hoping discipline appears later.This week, I want to add the next constraint:Cost.GitHub gave us a useful signal here. Copilot moved to usage-based billing in June, with AIcredits tied to token consumption. Soon after, GitHubreportedly hadits best month ever, driven by demand for AI-assisted coding. That is not just a GitHub story. It is a preview of where engineering work is going.Jensen Huang has also been talking about AI token budgets for engineers. Whether or not that becomes common compensation language, the point is hard toignore:tokens are becoming working capacity.For years, most developer toolsbehavedlike office software. You bought seats, assigned licenses, and forgot about it until renewal season. Agents do not work that way. They consume tokens, context, model capacity, tool calls, retries, logs, CI minutes, and review time.Sothe useful question for platform teams is not “is AI getting expensive?” Of course it is. The useful question is: are we operating agent usage like shared production capacity, or are we treating it like a bunch of harmless subscriptions?Here are five thingsI’dfix before agent usage becomes another mystery bill. Break the bill down by workflowThe worst version of AI spend is one large monthly number called “Copilot,” “Claude,” “OpenAI,” or “AI tools.”That number will start a finance conversation, but it will not help an engineering team make a decision.You need to know which workflows are consuming the money: code review, incident summaries, release notes, test generation, deployment helpers, log analysis, ticket triage, documentation updates, runbook execution. Once you see that split, the conversation changes. A workflow that saves ten engineers an hour every week may be worth the cost. A workflow that writes long summaries nobody readsprobably isnot.You already do this elsewhere. Shared infra gets tags. CI jobs get owners. Cloud spend gets split by service, team, or environment. Agents should not be exempt just because the invoice arrives under onevendorname.Startsimple. Track the workflow name, owner, model used, average cost per run, success rate, failed runs, retries, and whether human review was needed. That is enough to stop guessing.If you cannot connectspendto a workflow, you cannot tell whether the agent is creating value or just making the bill more interesting.2. Give agents budgets before they become popularNo one runs production services with unlimited CPU, unlimited memory, unlimited retries, and unlimited runtime. Agents should not be the exception. Every serious agent workflow needs a budget. Not just a money budget, but a behavior budget: max tokens per run, max tool calls, max retries, max runtime, max files pulled into context, max logs included, and max model tier allowed by default.The small leaks are usually the ones that hurt. A code review agent reads too much context. A troubleshooting agent keeps retrying the same weak path. A release agent generates a long report, then generates three polished versions of the same report. A helper tool uses the most expensive model because nobody changed the default. None of this looks dramatic in one run. But at the team scale, it becomes capacity.This is where platform engineering habits help. You do not need to ban usage. You need sane defaults. Most workflows should start with limits, then earn higher limits when the value is clear.3. Route work to the model it deservesA lot of agent cost comes from using the strongest model for the weakest job. A deployment summary does not need the same model as a multi-step incident investigation. A formatting task does not need the same model as risky code generation. A first-pass log explanation does not need the same model as cross-service root-cause analysis.Create tiers. Use cheaper models for summarization, classification, formatting, and routine explanations. Reserve the expensive models for work where reasoning quality actually changes the outcome: incident analysis, architecture trade-offs, complex code changes, migration planning, and workflows that touch production state. This is not about being cheap. It is about not using a crane to move a laptop.You already right-size infrastructure. You choose instance types, storage classes, queue sizes, and retention windows based on the workload. Agent workflows need the same treatment.The question is not “which model is best?” The better question is “which model is enough for this step?”4. Put retry loops on a leashRetries are where agent workflows quietly become expensive and annoying. A failed request is one thing. An agent that keeps re-reading logs, re-planning, re-calling tools, expanding context, and trying again can burn tokens without moving the problem forward.This is also where cost and safety meet. When an agent is stuck, you do not want it to spend more money becoming more confident about the wrong path. You want it to stop, summarize what it tried, and hand the problem back with evidence.So, define the loop rules before the loop runs. How many retries are allowed? What counts as progress? Which failures stop the run? When does the workflow move from “act” to “suggest”? When does a human need to step in?If you use Ansible, this instinct is already familiar. A playbook with bad exit behavior is not resilient. It is noisy. An agent loop has the same problem, except the noise now comes with token cost.A good agent workflow needs a circuit breaker.5. Add cost review to the rollout checklistBefore an agent workflow moves beyond a small group, ask the boring questions.What should a successful run cost? What does a failed run cost? What happens if fifty engineers use it every day? What happens during an incident when everyone runs it at once? Who owns the budget? Who gets alerted when usage spikes? What gets turned off first? Put these beside the safety questions. If the agent can change systems, review its permissions. If the agent can consume shared capacity, review its limits. Both belong in the rollout conversation.This does not have to become a committee. It just needs an owner and a threshold. If usage doubles, someone should know. If a workflow starts burning budget through retries, someone should see it before the month ends. If a premium model is being used for low-value work, someone should be able to move it down a tier.The worst time to discover the cost model is after everyone likes the workflow.The bigger point is simple. Last week’s issue asked how much autonomy an agent should get. This week’s question is how much it should be allowed to consume. Those two questions belong together. An agent with no permission boundary can break things. An agent with no consumption boundary can quietly become expensive, slow, and hard to defend. If it is part of the platform, operate it like part of the platform.Meter it. Cap it. Route it. Attribute it. Review it.Before approving an agent workflow, ask two questions. Is it safe enough to run? And is it worth repeating at scale? Because once agents move into daily work, the bill is not a surprise. It is telemetry.One last thing before I go, we do have a couple of events that are upcoming up specifically related Claude DevOps & GitOps Platform engineering which are a definite value add how to utilise your AI credits as well to build systems at scale.Agentic DevOps with Claude | July 23rdEarly Bird Live Now, 40% Off, last 48 hours before it’s sold out!Claude Code is the engineer. You’re watching it work.Four hours. A 33-component AI-native IDP built live on a real Kubernetes cluster, ArgoCD, Backstage, kgateway, observability stack included. The cluster is provisioned for you. You leave with the repo and a working reference architecture to take back to your team.Michael Rishi Forrester from Accenture, prev- KodeKloud is running this one. Limited Seats📅 Thursday, July 23rd | 11:00 AM EDTBOOK YOUR TICKETSAnd if last week's platform engineering conversation left you wanting to go deeper not just understand where AI belongs in the stack but actually watch it build the stack that is what July 23rd.AI-Powered GitOps and Platform Engineering Workshop | July 30thEarly Bird Live Now, 40% Off, last 4 days before it’s gone!Your AI agent doesn’t know your manifests are stale. That’s the whole problem.Three hours. Real ArgoCD and Flux workflows, live demos comparing fresh vs. stale context, and hands-on labs turning repeated agent tasks into tooling your team actually keeps. You’ll walk through drift detection, change review, and validation, the parts of platform engineering AI tools usually get wrong because nobody fed them current context.Taylor Dolezal from Dosu, an AI-native knowledge infrastructure for agents and humans is running this one, drawing on patterns from 100,000+ repos. Limited seats.📅 Thursday, July 30th | 11:00 AM EDTBOOK YOUR TICKETSIf you’re a regular CloudPro reader, I’d like to hear what you want covered next: agent cost models, MCP security, platform AI governance, agentic DevOps workflows, or the messy parts of actually getting these systems into production.Hit reply and tell me what would be most useful for your team.Cheers,Apramit BhattacharyaEditor-in-Chief*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;display:none;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.social_block .social-table{display:inline-block!important}}
Read more