Software Engineering9 min read

Your Marathon Session Is Running on Borrowed Time

Field Notes from 200+ Semi-Autonomous Sprints — Bonus Post

Subscription tiers are a business decision, not a technical guarantee. When the math changes, pipelines built on token discipline adapt. Marathon pipelines break.

MukenshiMarch 29, 2026

AI & ML

The 24-hour screenshot is the new "look at my AWS bill" tweet — impressive until nobody talks about what happens next.

Someone posts a screenshot of a 24-hour coding session with an AI agent. Nobody asks the obvious question: how long do you think the provider is going to let you do that on a subscription?

I run an autonomous coding pipeline. I push these tools hard — harder than most. And the single most important thing I've learned about subscription tiers is something the marketing will never tell you: your subscription is a business decision by the provider, not a technical guarantee to you.

You've Seen This Movie Before

Every "unlimited" tier in the history of technology has eventually been limited.

Unlimited mobile data plans became throttled data plans. Unlimited cloud storage became "up to 15GB free, then pay." Unlimited CI minutes became fair-use policies and queue deprioritization. Unlimited PTO — great, so I won't come into work and I still get paid? No, that's not what unlimited means. It never means what it says. It means "we trust you to use a reasonable amount, and we'll redefine reasonable whenever we want."

Every time, the pitch was the same: use as much as you want! And every time, when the tail of heavy users got fat enough that it threatened the margin, the correction came.

The correction is never dramatic. It's never "we lied, pay more." It's a quiet policy update. A new fair-use clause. Rate limits that didn't exist before. A tier restructuring where your current plan now has a cap it didn't have last month. The feature you relied on moves to the next tier up. The pricing page changes and your workflow doesn't fit the cheapest option anymore.

AI subscriptions are not exempt from this pattern. The compute costs behind these services are enormous and not yet fully stabilized. Providers are subsidizing usage to build market share. That subsidy has a shelf life, and nobody outside the finance team knows when it expires.

The Marathon Is the Tail

The person posting a 24-hour session screenshot is in the heavy-usage tail. But just because they're there doesn't mean you should be. Running an agentic pipeline doesn't require burning through context windows all day — and if yours does, the pipeline has a problem, not the provider.

Providers price subscriptions assuming a usage distribution. I don't know the exact shape of that distribution — nobody outside the providers does. But the business model only works if most subscribers use significantly less than the cost of serving them, so that the margin covers the users who use significantly more. Marathon agent sessions — continuous tool calls, millions of tokens per day, rate limits as a lifestyle — are not what the pricing was designed to sustain. They're what the pricing tolerates while the market is growing.

Every provider wants to be the one that serious builders choose, so the heavy users get tolerated. But "tolerated while the market is growing" is not the same as "sustainable forever."

This isn't speculation. We've already seen the early signals. Rate limit adjustments. Usage-based throttling on "unlimited" plans. Model routing changes where the cheapest tier gets a different model than advertised in certain conditions. These aren't bugs. They're the correction in slow motion.

The Coupling Problem

Here's the engineering mistake: building a pipeline that assumes 24/7 throughput on a fixed-price subscription.

That's not coupling to a technical capability. It's coupling to a pricing model. And pricing models change faster than APIs.

When your pipeline's throughput depends on being able to run agents continuously at a fixed cost, any change to the subscription terms becomes a pipeline failure. Not a pricing inconvenience — a functional failure. Your sprint velocity drops. Your cycle time increases. Your costs spike or your output shrinks. All because you built around the assumption that the current deal would last.

Eliyahu Goldratt called this pattern decades ago. His Theory of Constraints — originally about factory floors, not AI pipelines — says that every system's throughput is limited by a single constraint, and if you couple your process to that constraint without knowing it, you've handed control of your entire operation to whatever controls the bottleneck. When you build a pipeline around unlimited subscription access, the subscription is the constraint. Goldratt's fix was to identify the constraint, subordinate everything else to it, and build slack so the system absorbs changes instead of breaking. Same principle applies here.

This is the same mistake I warned about in The Ground Moves Under You: coupling to something the provider controls means the provider controls your pipeline. When they change the model, your prompts break. When they change the pricing, your economics break. Different lever, same dependency.

What Token Discipline Buys You

Everything I've written in this series about context management, session efficiency, right-sizing models, and killing sessions early — it's not just about quality. It's about resilience.

A pipeline built on token discipline uses fewer resources per unit of output. That means when rate limits tighten, you still fit within them. When a fair-use cap appears, you're under it.

The person running 24-hour marathon sessions and burning tokens freely is optimizing for today's pricing. The person running tight, checkpointed sessions with clean context management is optimizing for whatever pricing comes next. One of those positions adapts. The other breaks.

Think of it like fuel efficiency. When gas is cheap, nobody cares about miles per gallon. When the price spikes, the person driving a tank is stuck and the person driving something efficient barely notices. You don't build for efficiency because the current price demands it. You build for efficiency because you don't know what the future price will be, and efficiency makes you robust to changes you can't predict.

I'm not speaking theoretically. Here's what the efficiency curve looked like in my own pipeline over a single month:

Week	Sprints	Avg Duration	Shipped Stories per Hour	Wasted Hours
Week 1	9	8.0h	3.3/hr	71%
Week 2	12	9.8h	2.6/hr	37%
Week 3	16	3.6h	4.7/hr	64%
Week 4	24	2.8h	6.7/hr	6%

"Wasted hours" means time spent below a minimum efficiency threshold — sessions where the agent was wandering, debugging rabbit holes, or running long enough that context degradation was dragging output quality down.

Week 1: marathon sessions averaging 8 hours each. Seven out of every ten hours produced below-threshold output. Week 4: focused sprints averaging under 3 hours, delivering twice the output per hour, with essentially zero waste outside intentional R&D.

The difference isn't the model — the model didn't change. It's session discipline. Killing sessions early. Front-loading context. Checkpointing before degradation sets in. Every practice from this series, applied consistently, turned a loose pipeline into a tight one in four weeks.

If that month had been on API pricing, the estimated cost would have been 4-8x the subscription price. The subscription let me eat the learning tax — the 71% waste, the 9-hour sessions going nowhere — without it costing extra. The subscription is where you build the habits. The API is where bad habits get expensive. If you wouldn't do it on the API, you already know it's wasteful. You're just not feeling it yet.

You can't account for every possible pricing change. But a bursty pipeline can stretch and contract like a rubber band. If pricing gets tight running four parallel agents, drop it to three. If rate limits tighten, your sessions are already short enough to fit.

The Honest Math

Let's be precise about who's actually running marathon sessions and what it costs.

A 24-hour marathon isn't happening on the $20/month Pro tier. Pro's allocation burns through in a couple of hours of agentic work. The people posting marathon screenshots are on Max plans at $100 or $200 per month, which provide 5x or 20x the Pro allocation. Even then, the limits are real — Max 5x users report hitting rate limits after 90 minutes of intensive agentic work, and providers have been tightening session limits during peak hours.

What does that usage actually cost the provider? Kyle Redelinghuys published a detailed cost analysis of his Claude Code usage: 10 billion tokens across 8 months of heavy development work. The API equivalent would have been over $15,000. He paid roughly $800 on the Max plan — a 93% subsidy. And Redelinghuys was disciplined enough to track it. The marathon runners who aren't tracking are likely costing even more per dollar of subscription revenue.

Providers are eating this difference to win market share. But the adjustments have already started — weekly rate limits, peak-hour session throttling, and messaging that quietly shifted from "unlimited access" to "fair distribution of resources." The correction isn't coming. It's already here. And when the competitive pressure eases, the adjustments get less gentle.

What This Means Practically

I'm not telling you to stop using subscriptions. I use one. I'm telling you not to depend on the current terms. Specifically:

Track your consumption even on a subscription. You're not paying per token, but you should know your burn rate — surprise is the enemy when limits change.

Design for session efficiency, not session duration. A pipeline that ships a story in 40 minutes survives a rate limit cut. A 4-hour meandering session doesn't.

Keep your API cost model current. Know what your usage would cost on the API — that's the subsidy you're relying on.

Don't build around rate limit headroom. If you're routinely at 0.8X of your plan's limit, you have no margin. Build for 0.5X and use the rest for burst capacity.

Your pipeline should be bursty — able to stretch and contract with whatever limits come next.

The Real Flex

The 24-hour marathon screenshot gets engagement. The "I shipped the same output in 3 hours with clean sessions" screenshot doesn't. Nobody is impressed by efficiency. Everyone is impressed by endurance.

But endurance is a vanity metric when you're running on borrowed margin. The impressive pipeline isn't the one that can run for 24 hours. It's the one that produces the same results whether the limit is 24 hours or 4 hours. That's engineering.

Build like it might end, because it will.

This is a companion post to the Field Notes series. For the cost side of the same coin, see: The Million-Token Baby. Start from the beginning: The Agentic Maturity Model — Where Are You, and Where Are You Going?