Your AI Coding Session Is a Finite Resource. Manage It Like One.

You are forty minutes into a refactor. The model has read the relevant files, understands the architecture, and is tracking changes across five modules. Then something shifts. It starts repeating suggestions you already rejected. It re-reads files it loaded twenty minutes ago. The responses get vaguer. The session is rotting, and you have no useful signal about it.

I would restart at this point, paste in a quick summary of where I am, and lose twenty to thirty minutes rebuilding context. It might happen three or four times a day, and I would accept it as the cost of using these tools. It does not have to work this way.

A note on scope: when I first outlined this, I was thinking about the broader problem of dependency on a single session, single tool, or provider. That is a real problem but too large for one post. This one focuses on AI session management specifically. The deeper idea is picking up exactly where you left off in a completely different tool (and provider), not just a fresh session of the same one. That comes later. The protocol I describe here is designed with that portability in mind.

The Mental Model You Actually Need

Every AI coding session operates across three resource layers, and none of them refill mid-session.

The context window is working memory: what the model can see right now. Every file you attach, every response the model generates, every tool call and its output burns from this budget. When it fills, quality degrades before it hits a hard wall. Models do not announce this politely. They just get worse.

Tokens are fuel. They are consumed on both sides of every exchange: your input, the model's output, tool invocations, file loads. Most tools give you some visibility into this, though the granularity varies considerably across products.

Subscription capacity is your weekly tank. Hit the cap and you are out until it resets, regardless of which tool you are using or how important the deadline is.

None of this is hidden information. The tools expose it. The problem is that most developers never look until something breaks.

What Visibility Actually Looks Like Across Tools

The tools vary considerably on how much they surface these constraints during a session.

Claude Code has a /context command that breaks down current token usage by category: system prompt, tool definitions, memory files, conversation history. It also auto-compacts when the context window approaches its limit, summarising conversation history to reclaim space. Persistent instructions in a CLAUDE.md file survive across sessions, so you are not re-explaining project conventions every time you start fresh.

Cline shows token counts and cost per request in real time, and tracks cumulative spend for the entire task. It also creates checkpoints after tool calls, so you can restore the session to an earlier state if the model heads in the wrong direction. That combination of cost visibility and rollback capability makes it the most transparent option for sustained, expensive workloads.

Kiro and Windsurf both surface context usage in their UIs, with varying levels of detail depending on which mode you are using. Windsurf's Cascade mode is particularly good at loading only what is relevant rather than forcing you to manage file attachment manually.

The honest summary: all of them have improved significantly, but none give you a unified view across sessions or providers. If you are switching tools mid-project, which is increasingly common as teams standardise on different IDEs, you are doing that accounting yourself.

When to Act: The Thresholds Worth Tracking

You need three trigger points, and acting on any one of them is enough.

The first is context fill. When you are at 50 to 60 per cent capacity, act. Below that threshold it is premature. Above it, most tools will auto-summarise in ways you did not choose, and you lose fidelity on what gets kept. Check the context command or UI indicator; do not wait for degradation signals in the responses.

The second is token volume. Around 15,000 to 25,000 tokens consumed in a session is when quality starts to drift in most models. This is independent of context fill percentage because it correlates with how much reasoning the model has had to do, not just how full the window is. Tool-heavy agentic sessions, like the kind that interact directly with your CI/CD pipeline, burn through this budget faster than you expect.

The third is time. Thirty to forty-five minutes of active, tool-heavy use is a reasonable default if you are not watching the other two. Set a timer for the first session or two until checking context becomes habitual.

The practical action is simple: check context, then run a canned prompt. Something like: "Summarise the current state of this work in 400 tokens or fewer: what has been done, what files were changed, and what comes next." Save that output. You will need it.

The Handoff Protocol

Whether you are approaching a limit, switching providers, or ending the day, the handoff process is the same and takes under two minutes.

Ask the model to generate a handoff.md: current architecture decisions, open questions, exact files to reopen, and a brief summary of the conversation so far. Git commit everything, including the handoff file. In the next session (whether that is a fresh Claude Code session, Cline, Kiro, or Windsurf), attach handoff.md and the three to five most relevant files. Skip the full conversation history.

This works across every tool I listed. The reason it works is that it does not depend on any tool's internal state. It is just a structured text file that any model, in any harness, can read and orient from. That is the point: the protocol is tool-agnostic by design, which means it also covers the case where you are moving to a completely different product mid-project.

The sessions where I rebuild from a handoff.md consistently feel indistinguishable from the ones where the context never lapsed. The sessions where I do not have one feel like starting cold, even when they are not.

The Shift That Actually Matters

The tools will keep improving. Context windows are getting larger. Visibility is getting better. Some tools now handle compaction well enough that you will not notice the boundary at all.

But the mental model is the durable thing. Treat every session as temporary from the first message. Build checkpoints before you need them. Keep a handoff.md in your project root and update it when triggers fire, not when you are desperate.

Three things worth doing this week:

  • Add a handoff.md to your current project root. Generate a first version now, before a session dies and forces you to.
  • Check your primary tool's context indicator on your next session. Note what 50 per cent capacity feels like before quality drops.
  • Write your state summary prompt once and save it somewhere you can paste it quickly. You will use it more than you expect.

The developers who do this consistently waste almost no time on resets. The ones who do not rebuild context from memory every time they hit a wall. That gap compounds across a working week faster than most people realise until they measure it.

If you are running AI-assisted development at any volume, session management is not optional. It is just infrastructure you have not set up yet.

ImpactEng helps Australian engineering teams build the practices and tooling that make AI-assisted development reliable at scale. If you want to know where your team's gaps are, start with an AI Readiness Audit. Or get in touch directly.