Context management is the real challenge in LLM engineering

Everyone talks about prompts. The more I build on top of LLMs, the more I think context management is actually the harder and more important problem.

What I mean by context management

A language model doesn't have persistent memory. Every inference call is stateless. The "context" — that sequence of messages you send with each request — is the entirety of what the model knows about your conversation and task.

Managing that context well means:

Keeping relevant information in window
Removing irrelevant information to save tokens
Structuring information in a way the model can effectively use
Making the right tradeoffs between detail and brevity

Get it wrong and you get degraded responses, excessive costs, or both.

The context window is not just a limit, it's an interface

I think about the context window like an API surface. The prompt you're sending is the input contract. If you stuff it with noise, the model has to wade through it to find the signal. That doesn't always work well.

When building multi-surface bots (Discord, Telegram, web widget), each surface had different conversation patterns. Discord conversations tend to be longer and more casual. Telegram users were more task-focused. The web widget had the most complex multi-turn tasks.

We ended up with per-surface context strategies rather than a one-size-fits-all approach. The model performance improved noticeably.

Structured context beats raw chat history

Raw chat history is fine for simple assistants. For more complex workflows, structured context is significantly better.

Instead of:


User: What were the tasks I said I needed to do?
Assistant: You mentioned three tasks: deploy the API, review the PR, and update the docs.
User: Mark deploy as done

We stored task state explicitly and injected it as structured context:


System: Current task list:

- [DONE] Deploy the API

- [PENDING] Review the PR

- [PENDING] Update the docs

The model's ability to reason over structured state is much better than reconstructing state from conversation history. This became more obvious as task complexity increased.

Don't trust the model to remember

If something matters, put it in the context explicitly. Don't assume the model "remembers" something from earlier in the conversation — especially in long sessions.

We had a subtle bug where a user's preferences (timezone, formatting choices) were mentioned once early in the conversation, then the context window rolled over and dropped those turns. The bot started ignoring the preferences mid-session.

Fix: extract durable preferences from the conversation and re-inject them as part of the system prompt on each turn. They're small and always relevant, so they should never get pruned.

Compression is real but not free

Context compression (summarizing older turns) is a real and useful technique. But it has costs:

You lose fine-grained detail that might be needed later
The summary generation itself is an LLM call (cost + latency)
If the summary is wrong, errors can compound

I've found it works best for informational turns (user describing background, preferences, constraints) rather than action turns (the model taking actions, calling tools, returning results). Action history is usually better preserved literally.

The space is young

Context management is still largely artisanal. There aren't many battle-tested libraries for it. Most teams I talk to are rolling their own solutions.

That's both frustrating and exciting. The tooling will get better. For now, being deliberate and empirical about it — measuring context size, tracking cost per turn, watching for quality degradation in long sessions — is the best approach.