Software Engineering4 min read

Stop Waiting for Auto-Compaction. Your Agent Already Forgot What It Was Doing.

Field Notes from 200+ Semi-Autonomous Sprints — Part 1

Context degradation is gradual, not sudden. By the time auto-compaction fires, your agent has been impaired for a while.

MukenshiMarch 17, 2026

AI & ML

Here's something nobody talks about in the 'vibe coding' discourse: by the time your AI agent hits auto-compaction, it's already been operating at a significant quality deficit for a while. You just didn't notice because the outputs still looked reasonable.

This isn't just vibes. Chroma's 'Context Rot' study and Adobe Research's NoLiMa benchmark both show that LLM performance degrades substantially as context fills — most models drop below 50% of their short-context baseline by 32K tokens. Your agent doesn't need to hit the wall to be impaired.

I've run over 200 semi-autonomous sprints through a multi-agent coding pipeline. Not one-shot prompts. Not 'fix this function.' Full sprint execution — story planning, parallel worker dispatch, code review, merge. Thousands of iterations. And the single most expensive lesson was learning what context degradation actually looks like in practice.

The Silent Cliff

Context windows are marketed like fuel gauges. You start full, you use tokens, eventually you run low and the model summarizes to free up space. Simple.

Except it's not a gauge. It's more like oxygen at altitude. You don't notice you're impaired until you're making bad decisions. The model doesn't suddenly break at compaction — it gradually loses precision as the window fills. Earlier instructions lose influence as newer content accumulates. Conventions it was following perfectly at the start of the session quietly drift as the context fills with tool outputs, error logs, and intermediate work. The code still compiles. The tests still pass. But the architectural consistency quietly evaporates.

Then compaction fires, and it's a coin flip on what survives the summarization. Your carefully structured system prompt? Probably fine. That critical constraint you mentioned in message 14 about not touching the auth module? Gone.

Don't Wait. Preempt.

The fix isn't better compaction. The fix is never getting there.

The token economics make this concrete. A recent analysis by ClaudeTUI traced every token across real Claude Code sessions and found that in a long session with three compactions, only 76% of total tokens went to useful work — the rest was compaction overhead and unusable headroom. Each compaction compressed ~167K of rich context into a lossy 11-19K summary and blew away the prompt cache, forcing expensive rebuilds. A fresh session has its own startup costs — system prompt, bootstrapping, handoff loading — but it avoids the compounding tax of repeated lossy compression.

Treat context like a budget, not a tank. Every message, every tool call, every file read — that's spending. And like any budget, the returns diminish as you spend more. The tail end of a long session is almost always noise.

Here's what actually works in practice:

Kill sessions early. Use your gut. Natural checkpoints — task completion, phase transitions, the moment you realize you're in a debugging rabbit hole — are all good signals to start a fresh context. You don't need sophisticated persistence tooling to make this work. Ask your agent for a summary of where things stand, copy it into a scratchpad, clear the session, and paste it back in to pick up where you left off. It's low-tech and it works.

Front-load what matters. The beginning of context is prime real estate. Your most important constraints, conventions, and architectural decisions should live there — not buried in message 47 after three rounds of debugging a test fixture.

Track consumption, not just limits. 'How full is the context?' is the wrong question. 'How much signal-to-noise ratio do I have left?' is the right one. If 60% of your context is stack traces from a debugging loop, your effective context for the actual task is already compromised regardless of how many tokens remain.

Make handoffs a first-class operation. When you end a session early (and you should), the next session needs to pick up without re-discovering everything. That means structured handoff artifacts — not 'continue where we left off' which is just asking the new session to guess.

A caveat: this changes at scale. If you're running fully autonomous workers, auto-compaction is expected — you can't have autopilot and manual session management at the same time. But even then, the principle holds in a different form: checkpoint before handoffs, and design your workers to produce salvageable partial output when compaction costs them context.

Why This Matters More Than You Think

Most people using AI coding assistants are operating in short sessions where this never surfaces. Write a function, get a response, done. Context degradation is invisible at that scale.

But the moment you try to do anything ambitious — a multi-file refactor, a feature that touches six modules, a sprint with interconnected stories — you're playing a context management game whether you know it or not. The people getting the best results from AI coding aren't writing better prompts. They're managing context like a scarce resource, because it is one.

The agents aren't getting dumber as the session progresses. You're just feeding them noise and expecting signal.

Next up: Trust But Verify — Why your AI agent needs logic gates, not good vibes.