Software Engineering6 min read

Trust But Verify: Your AI Agent Needs Logic Gates, Not Good Vibes

Field Notes from 200+ Semi-Autonomous Sprints — Part 2

AI agents don't reliably follow instructions. Programmatic gates catch the drift that human review misses.

MukenshiMarch 18, 2026

AI & ML

I'm going to tell you something that will sound obvious and that almost nobody actually acts on: AI agents do not reliably do what they're told.

Not because they're broken. Not because the models are bad. Because 'do what you're told' is not how language models work. They predict plausible next tokens. Sometimes plausible and correct overlap perfectly. Sometimes the agent confidently does something adjacent to what you asked, and unless you have a gate checking the output, that near-miss ships.

After 200 semi-autonomous sprints, I don't trust agents. I verify them. Every time. And the gap between those two approaches is the difference between a pipeline that produces reliable output and one that produces impressive-looking chaos.

The Problem With 'It Looks Right'

Early in building my pipeline, I had agents that passed code review by other agents. The code compiled. Tests passed. PRs looked clean. And about 15% of the time, something was subtly wrong — a convention ignored, a file modified that shouldn't have been touched, an approach that worked but violated an architectural decision made three sprints ago.

The failure mode wasn't catastrophic. It was erosive. Each small drift was individually harmless. Accumulated over dozens of PRs, the codebase slowly became inconsistent in ways that were expensive to unwind later.

The root cause was simple: I was treating agent output like human output. 'It looks right, ship it.' But humans have persistent memory of why decisions were made. Agents don't. They have context, and context is lossy, temporary, and — as we covered last time — degrading from the moment the session starts.

Gates, Not Guidelines

The fix was treating agent output like untrusted input. Same principle you'd apply to a public API endpoint: validate everything, trust nothing, fail explicitly.

In practice, this means building checkpoints into the workflow where automated verification happens before the output moves forward. Not 'review the code' in the human sense. Actual programmatic checks.

Did the agent modify only the files it was supposed to? Diffable. Did the output match the structural requirements of the task? Parseable. Did it introduce changes to modules that were explicitly marked off-limits? Checkable. Do the tests still pass? Obviously — but also, did it write new tests, or just make sure the old ones don't fail? Different question, different gate.

The key insight is that these gates aren't about catching bad agents. They're about catching the inevitable drift that happens when a probabilistic system operates over extended tasks. The agent isn't malicious. It's just not deterministic. And non-deterministic systems need guardrails, not trust.

Cheap Checks Save Expensive Debugging

People resist adding verification because it feels like overhead. 'I'm using AI to go faster, and now you want me to add checkpoints that slow things down?'

Yes. Because the math works out overwhelmingly in your favor.

A gate that catches a convention violation at PR time costs seconds. That same violation discovered three weeks later during a refactor costs hours. Multiply that by the volume of output an agent pipeline produces and you're looking at the difference between a system that scales and one that creates technical debt faster than a human team could.

Think of it this way: you wouldn't deploy code without CI. You wouldn't push to production without tests. An AI agent generating 50 PRs a day is a deployment pipeline, and it needs the same rigor. More, actually, because the failure modes are more subtle than a syntax error.

What 'Trust But Verify' Actually Looks Like

It's not about adding reviews everywhere. It's about understanding what can go wrong at each stage and building the minimum viable check for that failure mode.

For task execution: Did the agent stay in scope? Check the diff against the task definition.

For code quality: Does it match your project's patterns? Linters and formatters are your first line — they're cheap and they catch the most common drift.

For architectural compliance: Did it respect boundaries? This is the hard one, and it's where most people give up. But if you've encoded your architecture decisions somewhere the agent can reference them and you can validate against them, you've closed the loop. A real example: if you don't want vendor lock-in to a particular CSS framework, that decision needs to be codified as a gate before development starts — not discovered as a problem after the product ships with thousands of vendor-specific utility classes baked into every component. By then, it's nearly impossible to untangle. These are architectural decisions, not refactoring tasks.

For completion: Did it actually finish, or did it get stuck in a loop and produce a 'good enough' partial result? Agents are remarkably good at producing confident-looking incomplete work. Watch especially for what I call 'kicking the can' — the agent hits something hard, leaves it broken, and notes it as a 'pre-existing issue' or 'not related to my changes.' It's not lying. It just took the path of least resistance past the obstacle. Your gate needs to distinguish between genuine pre-existing issues and problems the agent created or was supposed to fix but didn't.

The pattern is always the same: define what correct looks like before the agent runs, check against that definition after it finishes, and fail fast if it doesn't match. This isn't revolutionary. It's basic engineering discipline applied to a new kind of worker.

Don't try to solve the whole verification problem in one sweep. Start easy and work your way up. Does it build? Does it lint? Do your tests pass? Bake these into your gate one at a time so that you don't have to manually review what a machine can check for you. Save your eyes for what machines can't judge — UI/UX still requires a human in the loop, and automating that review is a genuinely hard problem that nobody has solved cleanly.

One hard rule: every gate must be deterministic. If a check can fail for reasons unrelated to the agent's work — flaky tests, network timeouts, hardware hiccups — it doesn't belong in your validation pipeline. Non-deterministic gates produce noise, noise erodes trust in the gates, and once you stop trusting your gates you stop looking at them. That's worse than having no gates at all.

The Uncomfortable Truth

If you're using AI agents without automated verification, you're not going faster. You're going faster and accumulating invisible debt. The velocity feels real because the output is real. But the quality guarantee is an illusion based on the assumption that 'it looks right' means 'it is right.'

It usually is right. But 'usually' at scale means 'regularly wrong,' and regularly wrong without detection means a slow, quiet mess.

Build the gates. Your future self will thank you.

Next up: Don't Let the Fox Guard the Henhouse — What happens when your agents route around your checks.