Software Engineering4 min read

The Ground Moves Under You

Field Notes from 200+ Semi-Autonomous SprintsPart 9

Model updates will break your pipeline. The question isn't if but when. Build to absorb change, not to resist it.

You will wake up one morning, run your pipeline, and something will be different. Not broken — different. The model you built everything on got an update. Nobody asked you. Nobody warned you. The prompts are the same. The tasks are the same. The outputs aren't.

This is the reality of building on someone else's foundation. The ground moves, and your pipeline has to survive it.

It's Not Always the Model

When people picture model updates breaking things, they imagine a dramatic capability regression. That happens, but it's rare and obvious. You notice immediately and either roll back or adapt.

The harder case is everything around the model. A provider updates their API behavior, changes rate limiting, tweaks how tool calls are formatted, or adjusts how the model interacts with system prompts. The model itself might be fine — better, even — but the plumbing shifted, and your pipeline was coupled to the old plumbing. I've seen this across providers. OpenAI's jump to 5.0 was a backstep in several practical dimensions. Gemini 3's Flash and Pro variants each had their ups and downs. Anthropic has had updates that broke rate limit calculations and destabilized tooling that was running fine the day before. None of them owe your pipeline stability.

And the subtlest breakage isn't crashes — it's behavioral drift. The model still follows your instructions, but the way it follows them shifts. Slightly more verbose output that parses differently against your gates. Tool calls in a different order than your orchestration assumed. An ambiguous instruction interpreted differently than the previous version did. If your spec had any ambiguity — it did — that interpretation change ripples downstream. This is where your metrics from Post 8 earn their keep. A 10% drop in first-attempt success rate after a model update is obvious if you're tracking it. Without metrics, it just feels like a rough week.

Stay Flexible

You can't prevent the ground from moving. You can build a pipeline that doesn't fall over when it does.

Don't couple to quirks. If your prompts rely on how a specific model version formats tool calls or handles ambiguous instructions, you've built on sand. Specify what you want explicitly rather than relying on how a particular version happens to deliver it. I had a worker dispatch system that parsed structured output by splitting on triple backticks — because that's how one model happened to format it. A minor update changed the formatting to use XML-style tags. Same content, completely different wrapping. Every worker dispatch failed for half a day before I traced it. The fix was five minutes. Finding it was four hours. The spec should have defined the output format explicitly instead of assuming the model's default was stable.

Pin when you can, update when you choose. If your provider offers model version pinning, use it. Run on a known version until you decide to upgrade. When you do, treat it like any other pipeline change — route a subset of tasks through the new version first, compare your metrics side by side, and promote deliberately. I run new model versions on low-risk tasks for at least a full sprint before switching over. The metrics from Post 8 make this comparison mechanical rather than gut-feel.

Keep your options open. If your entire orchestration is hardwired to one provider's API shape, a single bad update can ground you. The pipeline that can swap models — even imperfectly — has options that a single-provider pipeline doesn't. This doesn't mean building a full abstraction layer on day one. It means keeping your provider-specific logic in one place rather than scattered across every prompt and parser. When you need to swap — and you will — the blast radius stays contained.

Build a model-switch runbook. When you do need to change providers or versions, the last thing you want is to be figuring out the process under pressure. Document what needs to change: prompt format, API client configuration, output parsing, rate limit settings, cost monitoring thresholds. I keep a checklist that I update every time I do a model switch. The third time I used it, it paid for itself — I caught a rate limit configuration that would have burned through my budget in an hour.

The Only Guarantee

Providers will update their models. They will change their tooling. They will adjust pricing, rate limits, and API contracts. All of it will happen on their schedule, not yours.

The pipeline that survives isn't the one built on the best model available today. It's the one built to absorb change. The difference between a rough week and a dead pipeline is whether you built in the flex to handle what you couldn't predict.


Next up: When the AI Drives You Nuts, Take the Wheel — Why sometimes the fastest agent is you.