Token efficiency is a context problem
Most AI agents waste tokens for one reason: they reload the same context on every step instead of referencing it. The fix is not a shorter prompt, it is moving your files, data, and shared state into one layer the agent reads from by reference. This guide explains where the tokens actually go and why a shared data layer removes the waste.
Where do the tokens actually go?
Token spend in agents is rarely the model's "thinking". It is redundant context, loaded again and again. The four biggest sources:
| Source of waste | Why it happens | Typical cost |
|---|---|---|
| Re-reading the same file | The agent has no memory that it already read it | 2,000+ tokens per repeated read |
| Full filesystem or tool scans | The agent lists everything to find one thing | 10,000 to 20,000 tokens per command |
| Lost context between sessions | A new session re-reads the whole project | 30,000 to 50,000 tokens before real work |
| Raw tool output and screenshots | Dumped verbatim into the prompt | Up to 60 to 70 percent of spend in some agents |
Why does a bigger context window not fix it?
A bigger window does not remove the waste, it redistributes it. You still pay for every token you load, and long-context attention degrades in the middle of the window, so the agent both costs more and reasons worse. Loading more is not the same as loading the right thing.
How is token waste really a context problem?
The root cause is where your context lives. If it lives only inside the conversation, every step has to reload it. If it lives in a shared layer with stable addresses, the agent references it once and points back to it instead of re-ingesting it. Token efficiency is therefore an architecture question, not a prompting trick.
How does a shared data layer remove the waste?
adlass is a shared data layer where you, your team, and their agents work over the same files, datasets, and state, connected over MCP. The agent reads a document or a dataset by reference, once, from the layer, instead of stuffing its full contents into every prompt. Repeated reads become cheap lookups, scans become targeted queries, and a new session resumes from shared state instead of re-reading the project.
Deeper guides
- Why does my agent re-read the same file and waste tokens?
- Why does my agent lose context between sessions?
In this guide
- Why does my agent re-read the same file and waste tokens?
Agents re-read files because each step has no memory of prior reads. A shared, addressable layer lets the agent reference files instead of re-ingesting them.
- Why does my agent lose context between sessions?
Agents lose context because the conversation is the only memory. A shared data layer keeps files, decisions, and state so a new session resumes instead of re-reading.
Frequently asked questions
- Does a shorter prompt reduce token usage?
- A little, but it treats the symptom. Most waste comes from reloading files, scans, and lost session context, not from your instructions. Moving that context into a shared layer the agent references removes far more than trimming the prompt.
- Is a shared data layer the same as RAG?
- No. RAG retrieves passages from a static corpus into the prompt. A shared data layer holds live files, datasets, and shared state that agents and people both read and write by reference. You can still do retrieval inside the layer.
- Will prompt caching fix repeated file reads?
- Caching helps when context is stable, but it still costs more per token than fresh context and breaks when files change. A reference-based layer avoids re-ingesting the file in the first place.
Work with your agents on the same data
adlass is the shared data layer where you, your team, and their agents work over the same documents and datasets.