If you have been shipping AI agents into production and crossing your fingers, you are not alone. Most teams are. But according to Neel Somani, a researcher and systems builder who has worked on agentic tooling across several projects, that approach has a short shelf life.
Somani is not an AI safety researcher in the academic sense. He is closer to the operator end of the spectrum: someone who has actually built the tooling, run the workflows, and dealt with the aftermath when something goes sideways. His perspective on agentic deployment is less about the philosophy and more about what breaks first.
The short version: your prompt is not your permission model. And confusing the two is how organizations end up with agents doing things they never authorized.
The Prompt Is an Instruction, Not a Boundary
Here is the failure mode Neel Somani sees most often. A team writes a detailed system prompt. It specifies the agent’s role, its tone, its task. It might even include a few lines about what the agent should not do. Then the agent gets deployed, encounters a situation the prompt did not anticipate, and makes a judgment call.
That judgment call is not random. The model is making a reasonable inference based on its training and the context it has been given. But reasonable inferences are not the same as authorized actions. And in a business context, the difference matters enormously.
A prompt that says “help users complete their onboarding” does not indicate whether the agent can send a welcome email, create a billing record, or escalate to a support queue. The agent will figure out an answer to each of those questions on its own, every time it encounters them, based on whatever context it has available.
Somani recommends treating the permission model as a separate artifact from the prompt. The prompt handles task framing. The permission model handles action authorization. They are different problems and they need different owners: the prompt is usually written by whoever understands the use case, but the permission model needs sign-off from whoever owns the systems the agent can touch.
Start With the Action Taxonomy
The first thing Neel Somani does when designing an agentic workflow is build what he calls an action taxonomy: a written inventory of every category of action the agent might plausibly take, sorted by reversibility and consequence.
The taxonomy has three tiers. Tier one is safe and reversible: reading data, drafting content that a human reviews before it goes anywhere, running searches. These can be fully automated with low risk. Tier two is consequential but recoverable: creating records, sending internal notifications, updating fields in a database. These should be automated but logged, with a clear rollback procedure documented before deployment. Tier three is irreversible or high-stakes: sending external communications, making purchases, modifying user-facing settings. These should require human confirmation until the agent has demonstrated reliable behavior on a significant volume of tier-two tasks.
That taxonomy might eventually be specific within a domain-specific language, enabling a compiler to give detailed feedback when the agent attempts an invalid action. It also makes clear when an agent has no valid action within its DSL, which is a signal to improve the action taxonomy.
Most teams skip this step because it feels like overhead. It is not. It is the difference between an agent that fails gracefully and one that fails loudly in front of a customer.
Logging Is Not Optional, It Is the Product
One thing Neel Somani is consistent about is that observability for agents is not a nice-to-have. It is the core infrastructure that makes everything else trustworthy.
When an agent takes an action, that action should emit a structured log entry: the task it was executing, the action it took, the inputs it used, the output, and the downstream state that changed as a result. This log is not just for debugging. It is the audit trail that lets you answer the governance question when something goes wrong: who authorized this, what did the agent do, and can we reverse it.
Teams that skip logging because they are moving fast will eventually spend more time reconstructing what happened after an incident than they would have spent building the logging in the first place. Somani has seen this pattern repeat across multiple deployments. The logging infrastructure should be built before the agent goes to production, not added retroactively after the first failure. Fortunately, this logging infrastructure often comes for free with architectures like LangSmith.
Ownership Is a Person, Not a Team
The last thing Neel Somani is clear about is that every agent deployment needs a named human owner. Not a team, not a Slack channel, not a Jira board. A person.
That person is responsible for what the agent does. They review outputs regularly. They have the authority to pause the agent without waiting for approval from anyone else. And they are accountable when something goes wrong.
This sounds obvious. In practice, most organizations skip it because it feels overly formal for what they consider a software tool. But an agent that is authorized to take actions in production systems is not just a software tool. It is a decision-making entity operating in your name. Treating it like one from day one is what separates teams that successfully scale agentic operations from teams that accumulate quiet technical debt until something breaks.
Neel Somani’s framing is simple: autonomy is the feature. The governance is what makes the feature safe to ship.
For teams seriously considering agentic deployment, Neel Somani’s work on frameworks like web2mcp offers a practical starting point for understanding how model-controlled workflows interact with real production systems.
