Every heist movie follows the same arc. Somebody is losing something valuable. A crew assembles. Each member has a specialty. They work the problem from every angle until what seemed impossible — getting your money’s worth from a $200/month AI coding plan — suddenly looks like a clean getaway.
This is that movie, except the vault you are breaking into is your own Claude Code session, and the thing you are stealing back is your own tokens.
The past few weeks, X has been on fire with developers hitting their Claude Code limits absurdly fast. One prompt that used to cost 1% of the allocation now burns 10%. Even $200/month Max plan users find themselves rate-limited before lunch. Anthropic acknowledged it, tweaked peak-hour throttling, and… people kept hitting the wall.
The problem is not the plan. The problem is context hygiene. And these 18 hacks — organized from ground-level basics to architectural moves — are your crew for the heist.
Before the heist: know the vault
Every good crew studies the target before touching the lock. Tokens are the smallest unit of text Claude reads and charges for. Roughly one token per word, though not exactly. That part most people know. Here is the part they miss:
Every message you send makes Claude re-read the entire conversation from the beginning. Message one, its reply, message two, its reply, all the way to your latest prompt. Every. Single. Time.
This means conversation cost is not additive — it is compounding. Message one might cost 500 tokens. Message thirty costs 15,000, because it re-reads everything before it. One developer tracked a 100+ message chat and found that 98.5% of all tokens were spent re-reading old history. Not generating code. Not reasoning. Just re-reading.
On top of your messages, Claude also reloads your CLAUDE.md, MCP server definitions, system prompts, skills, and referenced files on every turn. Invisible overhead, constantly dripping into your context window.
And bloated context does not just cost more — it produces worse output. There is a phenomenon called “lost in the middle”: models pay the most attention to the beginning and end of context, while everything in the middle gets progressively ignored. You pay more and get less. The worst trade in software engineering since Oracle licensing.
Now you understand the vault. Time to meet the crew.
Tier 1 — The Street Crew
Nine operatives. Low barrier, high impact. Every Claude Code user should deploy all nine today.
Hack 1: The Clean Slate
Specialty: start fresh conversations between unrelated tasks.
This is Danny Ocean walking into the casino for the first time — no baggage, no history, just a clean read of the room. Use /clear between unrelated tasks. Do not carry context about feature A into a conversation about bug B.
Every message in a long chat is exponentially more expensive than the same message in a fresh one. This single habit extends your session life more than any other hack on this list. After 30 messages, you might be sitting at a quarter-million cumulative tokens. A fresh chat starts you back at the baseline.
Hack 2: The MCP Disconnector
Specialty: kill invisible token drains from MCP servers.
Every connected MCP server loads its full tool definitions into your context on every message. One server alone can eat 18,000 tokens per turn. You might have five connected and not even realize it.
Run /mcp at the start of each session and disconnect what you do not need. Better yet, if a CLI exists for that service (Google Workspace CLI instead of Google Calendar MCP, for example), use the CLI. It is faster, cheaper, and does not pollute your context window. The future of agent tooling is CLIs over MCPs — leaner, direct, no invisible payload.
Hack 3: The Batcher
Specialty: combine multiple instructions into one message.
Three separate messages cost roughly three times what one combined message costs, because each message triggers a full context re-read. Instead of:
“Summarize this.” → “Now extract the issues.” → “Suggest a fix.”
Send it all in one prompt:
“Summarize this code, extract the issues, and suggest fixes for each.”
If Claude gets something slightly wrong, edit your original message and regenerate instead of sending a follow-up correction. Follow-ups stack onto history permanently. Edits replace the bad exchange entirely. One swap saves you thousands of tokens.
Hack 4: The Architect
Specialty: use Plan mode before any real task.
Plan mode lets Claude map out the approach and ask the right questions before writing a single line of code. It prevents the biggest source of token waste: Claude going down the wrong path, writing code, realizing the approach was wrong, scrapping everything, and starting over.
Add this to your CLAUDE.md:
Do not make any changes until you have 95% confidence in what
you need to build. Ask me follow-up questions until you reach
that confidence level.
One paragraph of configuration. Saves thousands of tokens per session.
Hack 5: The Recon Pair — /context and /cost
Specialty: make the invisible visible.
/context shows what is eating your tokens right now — conversation history, MCP overhead, loaded files. /cost shows actual token usage and estimated spend for the session.
Most people have no idea where their tokens go. These two commands are the thermal imaging goggles of the heist: they reveal the laser grid you did not know was there. Run /context in a fresh session sometime. Before you type a single word, you are already down 50,000+ tokens from system prompts, tools, skills, and memory files. If MCPs are loaded too, that number explodes.
Hack 6: The Lookout — Status Line
Specialty: constant real-time visibility in your terminal.
The status line shows your model, a visual progress bar of context usage, and your token count. Always visible, always updating. You catch waste the moment it happens instead of after your budget is blown.
Set it up by running /status-line in your terminal session. Include the model name, context percentage, and token count. It is the dashboard on the heist van — you need it running before anyone moves.
Hack 7: The Dashboard Watcher
Specialty: keep your Claude usage dashboard open.
Pull up your Anthropic account usage page. Pin the tab. Check every 20–40 minutes. Know when your reset is coming. If you are near the limit with time left, step away — do not burn the last 5% on something small and get stuck mid-task. If the reset is close and you have budget left, go heavy and get your money’s worth.
You could even set up an automation (cron, Slack webhook) to ping you every 30 minutes with your remaining allocation.
Hack 8: The Surgeon — Smart Pasting
Specialty: paste only what Claude needs to see.
Before dropping a file or document into the chat, ask yourself: does Claude need to read the whole thing? If the bug is in one function, paste just that function. If context from one paragraph is enough, paste that paragraph.
You need to be as precise with what you feed Claude as Claude needs to be with what it reads. A 2,000-line file pasted when you needed 40 lines is 1,960 lines of pure token waste — and it actively degrades output quality by diluting the signal.
Hack 9: The Babysitter
Specialty: watch Claude work in real time.
Do not fire off a prompt and switch tabs. Watch what Claude is doing, especially on longer tasks. If it starts going down the wrong path — re-reading the same files in loops, exploring dead ends, hallucinating dependencies — hit stop immediately.
In a bad loop, 80% of tokens produce zero value. Catching it early is like pulling the plug on a slot machine before you feed it your last twenty. One minute of attention can save you ten minutes of burned budget.
Tier 2 — The Specialists
Five operatives. Slightly more setup, significantly more leverage.
Hack 10: The Index — Lean CLAUDE.md
Specialty: keep your config file under 200 lines and treat it as a routing table.
Claude auto-reads CLAUDE.md at the start of every chat. Not every session — every message. If your file is 1,000 lines, every single prompt (even “hi”) costs you those 1,000 lines of tokens.
Include only the essentials: tech stack, coding conventions, build commands, the 95% confidence rule. For everything else, point to where it lives. CLAUDE.md should be an index, not an encyclopedia. It tells Claude where to look, so it does not waste tokens searching.
This is a mindset shift. Apply it to skills, reference guides, documentation lookups. Lean pointers > fat payloads.
Hack 11: The Sniper — Surgical File References
Specialty: point Claude at exact locations instead of letting it explore.
Never say “here’s my whole repo, go find the bug.” Say:
Check the
verifyUserfunction insideauth.js, lines 45–80.
Use @filename to point at specific files. Let Claude read exactly what it needs and nothing more. Free exploration is expensive exploration — every file Claude opens to “look around” enters your context window and stays there.
Hack 12: The Compactor
Specialty: compact manually at 60% context capacity.
Auto-compact triggers at roughly 95%, by which point your context is already degraded and output quality has been declining for a while. Run /context or check your status line, and at around 60% capacity, run /compact with specific instructions on what to preserve.
After three or four compacts in a row, quality starts to degrade regardless. At that point: get a session summary, /clear, paste the summary, and continue fresh. Think of it as switching getaway cars — same crew, clean vehicle.
Hack 13: The Timer — Cache Awareness
Specialty: know the 5-minute prompt cache timeout.
Claude Code uses prompt caching to avoid reprocessing unchanged context. But the cache expires after five minutes of inactivity. Step away for a coffee break, come back, and your next message reprocesses everything from scratch at full cost.
This is why some people feel usage randomly spikes after returning to a session. Before stepping away, consider running /compact or /clear. When you come back, you start clean instead of paying a surprise re-read tax.
Hack 14: The Censor — Command Output Bloat
Specialty: control what shell output enters your context.
When Claude runs shell commands, the full output enters your context window. A git log returning 200 commits, an npm install dumping dependency trees, a find command listing thousands of files — all of it becomes tokens you are paying for.
Be intentional about what you let Claude run. If certain commands are known to produce verbose output and are not needed for a project, deny those permissions in your project settings. Pipe output through head or tail when you only need a portion. This is another case of invisible bleed: the command shows one line in the UI, but the full output is silently eating your budget.
Tier 3 — The Masterminds
Four operatives. These require strategic thinking but deliver the highest returns.
Hack 15: The Casting Director — Pick the Right Model
Specialty: match model power to task complexity.
Not every task needs the flagship model. A deliberate model strategy looks like this:
| Task | Model | Why |
|---|---|---|
| Standard coding, refactoring | Sonnet | Fast, smart enough for 80% of work |
| Sub-agents, formatting, simple tasks | Haiku | Cheap, fast, handles routine well |
| Deep architecture, complex planning | Opus | Full reasoning power, use sparingly |
Keep Opus under 20% of your usage. If you need a codebase review, consider bringing in a separate tool (Codex, for example) rather than burning expensive model tokens on reading comprehension.
Hack 16: The Budget Accountant — Sub-Agent Economics
Specialty: understand and control the cost of agent workflows.
Agent workflows use roughly 7–10x more tokens than a single-agent session. Each sub-agent wakes up with its own full context — system prompt, tools, files, everything reloaded from scratch. It is basically a separate session boot.
Use sub-agents deliberately:
- Delegate one-off research or exploration tasks to Haiku sub-agents
- Use them when you need summarized insights from multi-file analysis
- Avoid multi-agent orchestration unless the complexity genuinely demands it
80% of your tokens on a cheaper model beats 80% on the most expensive one. Agent teams are powerful but expensive — treat them like bringing in a specialist contractor, not your daily crew.
Hack 17: The Clock — Master Peak Hours
Specialty: schedule heavy work for off-peak times.
Anthropic adjusts how fast your 5-hour session window drains based on demand. Peak hours: 8 AM – 2 PM Eastern, weekdays. Off-peak (afternoons, evenings, weekends), your allocation lasts longer — possibly significantly longer.
Schedule your big refactors, multi-agent sessions, and architectural planning for off-peak. Use peak hours for lighter tasks, code reviews, planning.
And play the reset strategically: if you are near a reset with budget left, go heavy. Let the agents loose. Get your money’s worth before the window flips. If you are near the limit with time left, step away and come back with a full allocation instead of getting stuck mid-task at 0%.
Hack 18: The Constitution — Living CLAUDE.md
Specialty: make your config file a self-evolving decision record.
This builds on Hack 10, but goes further. CLAUDE.md should not just route — it should contain stable decisions, architecture rules, and progress summaries. Every architectural decision you store there is a paragraph you never have to type again. Every convention you document is a correction you never have to make.
Add rules that enforce token discipline:
## Token discipline
- Use sub-agents for any exploration or research
- If a task needs 3+ files or multi-file analysis, spawn
a sub-agent and return summarized insights only
- Spawn research sub-agents with Haiku
## Applied learning
When something fails repeatedly, when the user has to re-explain,
or when a workaround is found for a platform/tool/limitation,
add a one-line bullet here. Keep each bullet under 15 words.
No explanations. Only add things that save time in future sessions.
Check on this section regularly. Self-evolving configurations can bloat if left unattended. Prune ruthlessly. The point is a living, learning file that gets sharper over time, not fatter.
The getaway plan — what to do right now
You have met the crew. Here is the job, in order:
- Run
/context— see what your baseline token cost looks like before you type anything - Run
/cost— check your actual spend for the current session - Set up the status line — model, context percentage, token count, always visible
- Open your usage dashboard — know your allocation and reset time
- Disconnect unused MCP servers — every one you cut saves thousands of tokens per message
- Start complex tasks in Plan mode — prevent the biggest source of waste
- Use
/clearbetween unrelated tasks — the single highest-impact habit - Compact manually at 60% — do not wait for the automatic 95% threshold
- Batch multi-step instructions — one message instead of three
- Schedule heavy sessions for off-peak hours — get more work per token
There is one more thing worth saying. Hitting your limit is not a failure. If you are applying these hacks and still hitting the ceiling, it means you are a power user extracting real value from the tool. The people who never hit their limits are the ones leaving leverage on the table.
The goal is not to avoid spending tokens. It is to stop spending them on invisible overhead, redundant re-reads, and wrong-path explorations — and redirect every token toward actual productive output. That is the heist. And now you have your 18.
Related
If you are building systems that help AI agents retain knowledge across sessions — solving another major source of wasted effort — check out how to build a project memory system for AI coding agents.