Software Engineering5 min read

Context is Gold: Every Token You Waste Is One Your Agent Can't Use to Think

Field Notes from 200+ Semi-Autonomous Sprints — Part 6

Every token in the context window is either signal or noise. Noise doesn't just waste space — it degrades everything around it.

MukenshiMarch 22, 2026

AI & ML

If the first few posts in this series were about trust and verification, this one is about economics. Every agent session has a finite context window, and every token in that window is either helping the agent do its job or getting in the way. There is no neutral token. They're all either signal or noise, and noise doesn't just waste space — it actively degrades the quality of everything around it.

I use Claude for my pipeline, so the examples here are Claude-specific. But the principles are universal. If you're running any LLM agent at scale, you're playing the same resource management game whether you've named it or not.

The 50 First Dates Problem

Every new agent session starts from zero. It doesn't remember the last session. It doesn't know your codebase, your conventions, or your architecture. Every single time, it wakes up in a new world and needs to be told who it is and what it's doing.

This is annoying. It is also unavoidable. Build for it instead.

Front-load the context. Just the facts: here's the project, here's the architecture, here's the task, here's what not to touch. Treat it like a one-pager, not a novel. If you don't steer the agent upfront, it'll spend its own tokens wandering — scanning files, reading directories, drawing wrong conclusions about how things fit together. That exploration is expensive and noisy. A tight bootstrap is cheaper every time.

Sub-Agents: Temporal, Scoped, Right-Sized

When you dispatch a sub-agent, treat it as temporal and disposable. It gets exactly the files, snippets, and context it needs — nothing more. Maximum number of turns. Model matched to the cognitive demand: not every task needs your most powerful model.

The biggest token burner is wandering. A sub-agent without tight scope will explore — reading files that seem relevant, investigating tangential code paths, asking clarifying questions it could have been given the answer to upfront. Multiply that across parallel workers and you're hemorrhaging tokens on exploration instead of execution.

The fix: pre-compute the context. Before the sub-agent starts, assemble the package it needs. Hand it a package, not a codebase.

MCP Servers: Stop Collecting Pokemon

Model Context Protocol servers are powerful. They give your agent access to external tools — databases, APIs, project management, file systems. Each one extends what the agent can do.

Each one also costs tokens just by existing in the context. And each one is a new point of failure — a server that times out, returns bad data, or changes its schema will derail your agent mid-task.

Every connected MCP server adds its tool definitions to the agent's context window. That's schema descriptions, parameter lists, usage instructions — all consuming space before the agent has done a single thing. Connect five servers and you've spent a meaningful chunk of your context budget on tooling definitions the agent might never use in this session.

The temptation is to connect everything. More tools means more capability, right? In theory. In practice, it means more noise in the context, more options for the agent to consider on every action, and slower, less focused execution. And just because your provider suggests you connect a server doesn't mean you should. Those prompts are marketing, not engineering advice. You need to make the judgment call about what actually serves the task in front of you.

Audit your connected servers regularly. If a server hasn't been used in the last several sessions, disconnect it. You can always reconnect it when you actually need it. How many servers or which servers varies by task — you as the engineer have to determine the right fit. Just be careful what you ask for.

Checkpointing: Lightweight or It Won't Happen

Every session starts from zero, so you need a way to carry state forward. A checkpoint file that captures what was accomplished, what's in progress, and what the next agent needs to know. Sticky note, not novel.

Don't rely on the agent to checkpoint voluntarily — it won't, or it will inconsistently. Automate it with pre and post hooks so it happens whether the agent remembers or not.

Stale checkpoints are worse than no checkpoints. A checkpoint from three days ago is active misinformation that will steer the next agent in the wrong direction. Prune aggressively. A checkpoint should reflect right now, not last week.

Debugging Rabbit Holes: Garbage In, Garbage Out

When something breaks, the quality of your bug report directly determines how much context the fix will cost.

'The button doesn't work' sends the agent on an exploration mission — reading files, tracing state, burning context on investigation. 'The POST to /api/orders returns 422 because the payload is missing shipping_address validation' lets the agent go straight to the fix. You can hit the gas as hard as you want, as long as you don't veer off the road.

Environment matters too. Most AI coding tools are built for Linux and macOS. On Windows, shell commands and path handling generate more errors — each one polluting your context with noise. If you can run your pipeline in a Unix environment, do it.

UI/UX bugs are the hardest category. A model cannot tell you the spacing feels wrong or the animation is jarring. If you're reporting a UI/UX issue, be surgical: screenshots, annotated mockups, exact pixel values. The agent has no visual intuition. Guessing is expensive.

The Compound Effect

Context management compounds. Every token saved on noise is a token available for signal. The difference between disciplined and sloppy context management is the difference between an agent still producing quality output at turn 40 and one that lost the plot at turn 20.

Context is gold. Spend it like it matters.

Next up: Build to Scale, Not Scale to Build — Why mastering one agent matters more than running five.