I Reduced My OpenClaw Agent's Token Usage by 90% After It Started Eating Itself Alive
My OpenClaw agent was burning 24,000 tokens before reading a single word I sent. I traced the problem, fixed it with an architecture borrowed from data engineering, and cut token usage by 81%. Here is exactly what happened and how I did it.
The Night Our AI Went Into a Death Spiral
It started with a simple observation. My OpenClaw agent was getting slower. Sessions were compressing more often. By the end of a long conversation it was basically amnesiac. I would ask about something discussed an hour ago and get a blank stare.
I pulled the logs expecting a bug. What I found was something worse: a system eating itself alive, one token at a time.
The Numbers That Made Us Stop
24,000
tokens burned per message just to boot the agent before doing anything useful
Out of a 200,000 token context window, I was starting every conversation at 12% capacity before anything useful happened. Then it got worse.
When context fills up, the agent triggers compaction. That compaction call itself costs tokens. Which pushed us over the limit. Which triggered another compaction. Which cost more tokens. A loop with no exit.
OpenClaw was not getting dumber. The memory architecture was killing it.
The Diagnosis: What 81% of That Weight Was
| File | Before | After | Reduction |
|---|---|---|---|
SESSION-STATE.md |
38,400 chars | 6,200 chars | 84% |
MEMORY.md |
12,800 chars | 4,841 chars | 62% |
| AGENTS.md + SOUL.md + USER.md | 18,600 chars | 8,400 chars | 55% |
| Other context files | 25,200 chars | 1,200 chars | 95% |
| Total bootstrap | 95,000 chars | 17,642 chars | 81% |
The biggest offender was SESSION-STATE.md. It had been accumulating detailed logs of every OpenClaw session going back weeks. Meeting notes. Debugging timelines. Decisions already executed. I was asking my agent to memorize its entire diary before responding to a single message.
The Fix: A Medallion Architecture for Agent Memory
Here is the insight that changed everything.
AI agent memory has the same problem that data engineering solved years ago. Raw data, cleaned data, and business-ready data all need different storage strategies. You do not run analytics off your transaction logs. You do not archive your S3 bucket in RAM.
I applied the same principle to my OpenClaw agent's memory. Three tiers:
The shift in philosophy: if it is not in Gold, OpenClaw searches for it on demand. No preloading.
The old system: if something is not in the bootstrap, it is gone. The new system: OpenClaw queries the Silver index in under a second. Completely different. Completely different relationship with memory.
The Five Scripts That Keep It That Way
Knowing the architecture is one thing. Keeping it clean requires automation. I built five maintenance scripts that run nightly at 3:30am via OpenClaw cron with zero manual intervention:
- rotate_session_state.py — Scans SESSION-STATE.md for entries older than 3 days. Archives them before deleting. Keeps the hot file under 8,000 characters.
- prune_sessions.py — Removes stale session fragments and compacted summaries that are no longer relevant.
- consolidate_memories.py — Semantic deduplication. Any two memories with cosine similarity above 0.92 get merged. Prevents memory bloat over time.
- bootstrap_monitor.py — Measures current bootstrap footprint in characters and tokens. Returns a health status. Currently: 17,642 / 40,000 chars (44%).
- smart_bootstrap.py — Lazy loader that scores each context file by relevance to the current session and excludes low-relevance files from the load.
The Compaction Reserve Trick
One configuration change most people skip: raise your compaction reserve floor.
By default, most agent frameworks trigger compaction at 90% context usage. That leaves almost no room. We moved ours to trigger at 75%, giving 50,000 tokens of safety buffer. The compaction call itself costs tokens, so triggering it with more room means it costs less and disrupts less.
A 10% compaction reserve on a 200K window is 20K tokens. Barely enough to breathe. A 25% reserve is 50K tokens of real working space. Your agent can complete complex multi-step tasks without hitting the wall mid-execution.
Change this before anything else. It costs nothing and immediately makes your agent more stable.
The Results
4,410
tokens per message now vs 24,000 before. 81% reduction. Every message. Forever.
In a day of heavy use, say 100 messages, that is roughly 2 million tokens no longer burned on memory management overhead. Sessions run longer, retain more, compact less.
More importantly: OpenClaw actually remembers things now. Ask about a conversation from two weeks ago and it searches the Silver layer, finds the relevant snippet in under a second, and responds with specifics. Not because it was loaded into RAM. Because it was stored correctly and retrieved on demand.
Why This Matters Beyond Our Stack
This started as an OpenClaw problem, but the pattern is universal.
Any stateful AI agent hits this wall. OpenClaw, Cursor, AutoGPT, custom GPT wrappers with retrieved context windows, all face the same constraint math. The platform does not matter. The architecture does.
The pattern is always the same: agent needs memory, memory accumulates, nobody trims it, context fills up, performance degrades, everyone blames the model or the API.
The model is not the problem. The memory architecture is the problem.
And the fix is not complicated. It is data engineering. Tiered storage, rotation policies, deduplication, and a hard cap on what goes into hot context. The same principles that make databases fast at scale apply directly to AI agents.
The TL;DR: Fix This Today
- Audit your bootstrap files. Count every character that loads at startup. Anything over 40,000 total is a problem.
- Archive, do not delete. Old session state is valuable in cold storage. It is poison in hot context.
- Raise your compaction reserve. Give your agent 50,000 tokens of breathing room before the alarm trips.
- Build tiered memory. Hot context for what always matters. Searchable index for everything else. Cold storage for full history.
- Automate the maintenance. A nightly rotation script prevents the gradual accumulation that causes these spirals.
OpenClaw is not getting dumber. You just forgot to build it a filing system.
Is Your AI Stack Burning Tokens?
I audit OpenClaw and AI agent architectures and fix the memory inefficiencies killing your performance. If your agent is forgetting things, compacting constantly, or burning tokens on overhead, let us talk.
Talk to BASAWE