html:not(.dark) ::selection{background:#FD605A!important;color:#fff!important}
← [BACK_TO_JOURNAL]
Engineering

I Reduced My OpenClaw Agent's Token Usage by 90% After It Started Eating Itself Alive

February 20, 2026
A lobster eating AI tokens in a server room

My OpenClaw agent was burning 24,000 tokens before reading a single word I sent. I traced the problem, fixed it with an architecture borrowed from data engineering, and cut token usage by 81%. Here is exactly what happened and how I did it.

The Night Our AI Went Into a Death Spiral

It started with a simple observation. My OpenClaw agent was getting slower. Sessions were compressing more often. By the end of a long conversation it was basically amnesiac. I would ask about something discussed an hour ago and get a blank stare.

I pulled the logs expecting a bug. What I found was something worse: a system eating itself alive, one token at a time.

What was happening: Every single message triggered a full bootstrap load. Every context file got stuffed into the context window before a single word of response was written by my OpenClaw agent. Over weeks of active use, one file had ballooned to 62,000 characters. That is 15,500 tokens. Loading on every single message. Every time.

The Numbers That Made Us Stop

24,000

tokens burned per message just to boot the agent before doing anything useful

Out of a 200,000 token context window, I was starting every conversation at 12% capacity before anything useful happened. Then it got worse.

When context fills up, the agent triggers compaction. That compaction call itself costs tokens. Which pushed us over the limit. Which triggered another compaction. Which cost more tokens. A loop with no exit.

OpenClaw was not getting dumber. The memory architecture was killing it.

The Diagnosis: What 81% of That Weight Was

File Before After Reduction
SESSION-STATE.md 38,400 chars 6,200 chars 84%
MEMORY.md 12,800 chars 4,841 chars 62%
AGENTS.md + SOUL.md + USER.md 18,600 chars 8,400 chars 55%
Other context files 25,200 chars 1,200 chars 95%
Total bootstrap 95,000 chars 17,642 chars 81%

The biggest offender was SESSION-STATE.md. It had been accumulating detailed logs of every OpenClaw session going back weeks. Meeting notes. Debugging timelines. Decisions already executed. I was asking my agent to memorize its entire diary before responding to a single message.

The Fix: A Medallion Architecture for Agent Memory

Here is the insight that changed everything.

AI agent memory has the same problem that data engineering solved years ago. Raw data, cleaned data, and business-ready data all need different storage strategies. You do not run analytics off your transaction logs. You do not archive your S3 bucket in RAM.

I applied the same principle to my OpenClaw agent's memory. Three tiers:

BRONZE
Cold storage. Every daily note, session archive, raw log. Everything preserved, nothing loaded. Indexed for search.
SILVER
Searchable index. Local SQLite with semantic embeddings. 124 memories, zero token cost unless queried.
GOLD
Hot context. Only what the agent genuinely needs on every turn. Hard cap at 40,000 characters total.

The shift in philosophy: if it is not in Gold, OpenClaw searches for it on demand. No preloading.

The old system: if something is not in the bootstrap, it is gone. The new system: OpenClaw queries the Silver index in under a second. Completely different. Completely different relationship with memory.

The Five Scripts That Keep It That Way

Knowing the architecture is one thing. Keeping it clean requires automation. I built five maintenance scripts that run nightly at 3:30am via OpenClaw cron with zero manual intervention:

  • rotate_session_state.py — Scans SESSION-STATE.md for entries older than 3 days. Archives them before deleting. Keeps the hot file under 8,000 characters.
  • prune_sessions.py — Removes stale session fragments and compacted summaries that are no longer relevant.
  • consolidate_memories.py — Semantic deduplication. Any two memories with cosine similarity above 0.92 get merged. Prevents memory bloat over time.
  • bootstrap_monitor.py — Measures current bootstrap footprint in characters and tokens. Returns a health status. Currently: 17,642 / 40,000 chars (44%).
  • smart_bootstrap.py — Lazy loader that scores each context file by relevance to the current session and excludes low-relevance files from the load.

The Compaction Reserve Trick

One configuration change most people skip: raise your compaction reserve floor.

By default, most agent frameworks trigger compaction at 90% context usage. That leaves almost no room. We moved ours to trigger at 75%, giving 50,000 tokens of safety buffer. The compaction call itself costs tokens, so triggering it with more room means it costs less and disrupts less.

A 10% compaction reserve on a 200K window is 20K tokens. Barely enough to breathe. A 25% reserve is 50K tokens of real working space. Your agent can complete complex multi-step tasks without hitting the wall mid-execution.

Change this before anything else. It costs nothing and immediately makes your agent more stable.

The Results

4,410

tokens per message now vs 24,000 before. 81% reduction. Every message. Forever.

In a day of heavy use, say 100 messages, that is roughly 2 million tokens no longer burned on memory management overhead. Sessions run longer, retain more, compact less.

More importantly: OpenClaw actually remembers things now. Ask about a conversation from two weeks ago and it searches the Silver layer, finds the relevant snippet in under a second, and responds with specifics. Not because it was loaded into RAM. Because it was stored correctly and retrieved on demand.

Why This Matters Beyond Our Stack

This started as an OpenClaw problem, but the pattern is universal.

Any stateful AI agent hits this wall. OpenClaw, Cursor, AutoGPT, custom GPT wrappers with retrieved context windows, all face the same constraint math. The platform does not matter. The architecture does.

The pattern is always the same: agent needs memory, memory accumulates, nobody trims it, context fills up, performance degrades, everyone blames the model or the API.

The model is not the problem. The memory architecture is the problem.

And the fix is not complicated. It is data engineering. Tiered storage, rotation policies, deduplication, and a hard cap on what goes into hot context. The same principles that make databases fast at scale apply directly to AI agents.

The TL;DR: Fix This Today

  1. Audit your bootstrap files. Count every character that loads at startup. Anything over 40,000 total is a problem.
  2. Archive, do not delete. Old session state is valuable in cold storage. It is poison in hot context.
  3. Raise your compaction reserve. Give your agent 50,000 tokens of breathing room before the alarm trips.
  4. Build tiered memory. Hot context for what always matters. Searchable index for everything else. Cold storage for full history.
  5. Automate the maintenance. A nightly rotation script prevents the gradual accumulation that causes these spirals.

OpenClaw is not getting dumber. You just forgot to build it a filing system.

Is Your AI Stack Burning Tokens?

I audit OpenClaw and AI agent architectures and fix the memory inefficiencies killing your performance. If your agent is forgetting things, compacting constantly, or burning tokens on overhead, let us talk.

Talk to BASAWE

Ready to initiate the // shift?

Contact Command