Token efficiency is a context problem

Most AI agents waste tokens for one reason: they reload the same context on every step instead of referencing it. The fix is not a shorter prompt, it is moving your files, data, and shared state into one layer the agent reads from by reference. This guide explains where the tokens actually go and why a shared data layer removes the waste.

Where do the tokens actually go?

Token spend in agents is rarely the model's "thinking". It is redundant context, loaded again and again. The four biggest sources:

Source of wasteWhy it happensTypical cost
Re-reading the same fileThe agent has no memory that it already read it2,000+ tokens per repeated read
Full filesystem or tool scansThe agent lists everything to find one thing10,000 to 20,000 tokens per command
Lost context between sessionsA new session re-reads the whole project30,000 to 50,000 tokens before real work
Raw tool output and screenshotsDumped verbatim into the promptUp to 60 to 70 percent of spend in some agents

Why does a bigger context window not fix it?

A bigger window does not remove the waste, it redistributes it. You still pay for every token you load, and long-context attention degrades in the middle of the window, so the agent both costs more and reasons worse. Loading more is not the same as loading the right thing.

How is token waste really a context problem?

The root cause is where your context lives. If it lives only inside the conversation, every step has to reload it. If it lives in a shared layer with stable addresses, the agent references it once and points back to it instead of re-ingesting it. Token efficiency is therefore an architecture question, not a prompting trick.

How does a shared data layer remove the waste?

adlass is a shared data layer where you, your team, and their agents work over the same files, datasets, and state, connected over MCP. The agent reads a document or a dataset by reference, once, from the layer, instead of stuffing its full contents into every prompt. Repeated reads become cheap lookups, scans become targeted queries, and a new session resumes from shared state instead of re-reading the project.

Deeper guides

  • Why does my agent re-read the same file and waste tokens?
  • Why does my agent lose context between sessions?

In this guide

Frequently asked questions

Does a shorter prompt reduce token usage?
A little, but it treats the symptom. Most waste comes from reloading files, scans, and lost session context, not from your instructions. Moving that context into a shared layer the agent references removes far more than trimming the prompt.
Is a shared data layer the same as RAG?
No. RAG retrieves passages from a static corpus into the prompt. A shared data layer holds live files, datasets, and shared state that agents and people both read and write by reference. You can still do retrieval inside the layer.
Will prompt caching fix repeated file reads?
Caching helps when context is stable, but it still costs more per token than fresh context and breaks when files change. A reference-based layer avoids re-ingesting the file in the first place.

Work with your agents on the same data

adlass is the shared data layer where you, your team, and their agents work over the same documents and datasets.