When building AI agents, should safety be probabilistic (prompt-based) or deterministic (code-enforced)?

Something I keep running into when building agentic workflows, and I don’t see it discussed much here:

Most of us control what an AI agent can or can’t do through the system prompt — “don’t call external APIs”, “don’t delete files”, “always ask before sending emails”. It works… most of the time. But there’s a fundamental problem: system prompts are probabilistic. The LLM follows them on average, not always. Under long context, distraction, prompt injection, or just model drift across steps, they silently fail.

The alternative is deterministic enforcement — intercepting the tool call before it executes at the code layer, regardless of what the model decided. If the policy says “no external URLs”, the call never fires. Not because the model remembered, but because the runtime blocked it.

I’ve been exploring this second approach while building a small open-source layer called SupraWall that sits between agent decisions and tool execution. It’s made me think harder about where the real safety boundary should live.

The question I’d love this community’s perspective on:

  1. Is prompt-based constraint sufficient for production agents, or is it fundamentally the wrong layer for enforcement?

  2. Does deterministic policy enforcement make agents less flexible / harder to build, or is that a worthwhile trade-off?

  3. As we move toward multi-agent systems (like what Andrew covers in the AI Agents courses), does the probabilistic approach compound the risk?

Curious especially from people who’ve gone through the agentic workflow or MCP courses — how are you thinking about this in your own projects?

Hi Alejandro_Paris,

This may not answer your question, but your post made me think of this course:

It seems to me that the approach taken in that course allows for a combination of prompt-based constraint, with some room for creativity, and customized output restrictions based on deterministic policy (which could intercept a tool call) that leaves some room for creativity as well.