What interested me most wasn’t the model itself, but the surrounding system design: automated context retrieval, evaluation loops, and memory that improves the agent over time.
I’ve been experimenting with recreating a similar setup, but with a different goal: making the configuration more accessible for any company.
The result is an open-source YAML + Markdown framework where you define context sources, tools, and behavior explicitly instead of writing Python code. The idea is to make agent context easier to reason about, version, and iterate on, especially for data teams.
You’re right that it’s important to keep facts, numbers, constants, and the like and that’s exactly how the agent is prompted. It’s not really summarization, it’s removing noise (UI chrome, menus, repeated labels, etc.) while keeping all the numbers and facts. I’ve been using the tool myself for a while and it’s held up well. If we find that keeping everything as-is works better, we can always turn off densification and keep the raw capture.
One thing I keep seeing in practice is that “memory” problems are often less about storage and more about structure + retrieval strategy.
Vector search helps sometimes, but for a lot of agent workflows we’ve had better results with explicit context organization (files, metadata, rules) rather than semantic similarity alone.
Curious how you’re thinking about memory updates over time — append-only vs rewriting summaries?
That matches our experience pretty closely.
A lot of “memory” issues we saw weren’t about storage capacity, but about what kind of information is allowed to persist and how it’s structured. Once everything is flattened into one blob, retrieval strategy becomes the only lever left — which is where vectors often get overused.
In Mneme, updates are intentionally asymmetric: – Facts are append-only and explicitly curated (they’re meant to be boring and stable). – Task state is rewritten as work progresses. – Context is disposable and aggressively compacted or dropped.
The idea is that only a small subset of information deserves long-term durability; everything else should be easy to overwrite or forget.
This reduces the need for heavy retrieval logic in the first place, since the model is usually operating over a much smaller, more explicit working set.
This is super helpful — most writeups skip over the actual communication steps, so seeing the All-to-All flow laid out makes it much clearer.
Curious from your experiments: at 1M+ context, does communication start dominating vs compute?
I keep seeing cases where bigger context windows are technically possible but don’t translate into better results unless the context is very structured, so I wonder where the real scaling limit ends up being in practice.
As we scale to 1MN context length (inference) the biggest bottleneck is memory and to tackle that at scale we pay the price of communication overhead. Now fortunately the gpus are smartly fetching data for the next step while the previous step is computing thus masking the communication overhead and keeping responses at such scale appear realistic.
The quality degradation as context length increaes is a whole another science problem
Here is our contact page, feel free to contact us and we'll hep you setup: https://docs.getnao.io/docs/support/support
Also, we'll be adding a new dbt onboarding flow tomorrow!
Probably in a few months. For now we're focusing to make the experience great for a restricted number of warehouses. But you can reach out by email and we'll keep you updated
What interested me most wasn’t the model itself, but the surrounding system design: automated context retrieval, evaluation loops, and memory that improves the agent over time.
I’ve been experimenting with recreating a similar setup, but with a different goal: making the configuration more accessible for any company.
The result is an open-source YAML + Markdown framework where you define context sources, tools, and behavior explicitly instead of writing Python code. The idea is to make agent context easier to reason about, version, and iterate on, especially for data teams.
Repo: [https://github.com/getnao/nao](https://github.com/getnao/nao)
Would love feedback from people who have tried deploying analytics or data agents before.