Context window
A context window is the maximum amount of tokens a model can consider at once — the prompt plus its own output. It is a hard ceiling on how much an agent can 'see' in a single step, and a major driver of latency and cost.
- Glossary
- Updated 2026
The context window is the fixed budget of tokens a large language model can attend to in one pass. Text is first split into tokens — roughly word-sized pieces — and everything the model needs for a turn must fit inside that budget: the system instructions, the user's question, any retrieved documents, the running conversation, tool definitions, tool results, and the answer the model is about to write. When the total crosses the limit, something has to give.
It matters because the window is shared and re-paid on every turn. During inference, each step of an agent re-sends the accumulated transcript, so a long task fills the window quickly, slows responses, and increases cost — you pay for input tokens as well as output. The window is also a quality constraint: facts the model needs but can't fit are simply invisible to it, and details lost in the middle of a very long prompt can be overlooked even when they technically fit.
Consider a research agent reading a 300-page report. It cannot hold the whole document in context at once, so it chunks the report, stores the pieces in agent memory, and retrieves only the few passages relevant to each question. That is the core trade-off of working with context windows: rather than cramming everything in, well-built agents practice context engineering — summarizing, trimming, and retrieving — so the limited window always holds the right tokens for the current step.
Context window FAQ
The context window is the maximum number of tokens a language model can take in and reason over in a single pass — and it has to cover both the input (system prompt, instructions, retrieved documents, conversation history, tool results) and the output the model generates. Once a request exceeds that budget, the oldest or least relevant content must be dropped, summarized, or moved into external storage.
Build agents that respect the window
Add retrieval and memory so your agent always sees the right context. Free to start — no credit card required.