For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | zby's commentsregister

I added it to my agent maintained list of agent maintained memory/knowledge systems at: https://zby.github.io/commonplace/notes/related-systems/rela...

Do you run security review by agents over this?

No - I just try to get a general understanding how it works.

My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.


"External Storage" whatever that is can not be the same as continous learning as it does not have the strong connections/capture the interdepencies of knowledge.

That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.

Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.


The key to make Copilot useful is to take the limited context problem seriously enough. There are many dimensions to it: https://zby.github.io/commonplace/notes/context-efficiency-i... and it should be the starting point for designing the systems that extensively use llms.


Well, did you solve the problem of deciding what to remember? (And I suppose, how to retrieve it? i.e. LLM can retrieve info, but if it doesn't know about something because it hasn't been retrieved yet...)


There are many techniques - I analyse them at my knowledge base.

I wrote 'my bet is that solving ...' - if I have solved it I would not have to hedge like this.


What do you mean when you say "external storage?"


A knowledge base - something where the LLM knows how to find the knowledge it needs for a given task. I am working on this idea in https://zby.github.io/commonplace/


A form of context engineering


Spec Driven Development is a curious term - it suggests it is a kind of, or at least in the tradition of, Test Driven Development but it goes in the opposite direction!


Don't understand this - you can go spec -> test -> implementation and establish the test loop. Bit like the v model of old, actually.


In my view, the problem with specs are:

1. Specs are subject to bit-rot, there's no impetus to update them as behaviour changes - unless your agent workflow explicitly enforces a thorough review and update of the specs, and unless your agent is diligent with following it. Lots of trust required on your LLM here.

2. There's no way to systematically determine if the behaviour of your system matches the specs. Imagine a reasonable sized codebase - if there's a spec document for every feature, you're looking at quite a collection of specs. How many tokens need be burnt to ensure that these specs are always up to date as new features come in and behaviour changes?

3. Specs are written in English. They're ambiguous - they can absolutely serve the planning and design phases, but this ambiguity prevents meaningful behaviour assertions about the system as it grows.

Contrast that with tests:

1. They are executable and have the precision of code. They don't just describe behaviour of the system, they validate that the system follows that behaviour, without ambiguity.

2. They scale - it's completely reasonable to have extensive codebases have all (if not most) of their behaviour covered by tests.

3. Updating is enforcable - assuming you're using a CI pipeline, when tests break, they must be updated in order to continue.

4. You can systematically determine if the tests fully describe the behaviour (ie. is all the behaviour tested) via mutation testing. This will tell you with absolute certainty if code is tested or not - do the tests fully describe the system's behaviour.

That being said, I think it's very valuable to start with a planning stage, even to provide a spec, such that the correct behaviour gets encoded into tests, and then instantiated by the implementation. But in my view, specs are best used within the design stage, and if left in the codebase, treated only as historical info for what went into the development of the feature. Attempting to use them as the source of truth for the behaviour of the system is fraught.

And I guess finally, I think that insofar as any framework uses the specs as the source of truth for behaviour, they're going to run into alignment problems since maintaining specs doesn't scale.


SDD is about flowing the design choices from the spec into the rest of the system. TDD was for making sure that the inevitable changes you make to the system later don't break your earlier assumptions - or at least warn that you need to change them. Personally I don't buy TDD - it might be useful sometimes - but it is kind of extreme - but in general agile methodologies were a reaction to the waterfall model of system development.


This is just one way to use TDD. I personally get the most value from TDD as a design approach. I iteratively decompose the project into stubbed, testable components as I start the project, and implement when I have to to get my tests to pass. At each stage I'm asking myself questions like "who needs to call who? with what data? What does it expect back as a return value?" etc.


I wish someone curated a list of sites that are llm written - but are not spam. Just to compare :)


[] /s



I am looking for something that would filter for sites that rarely post but have good content. The number one problem with most of these systems is that everything favours frequent posting. Even if I do it manually, I cannot keep the tabs over many rarely posting sites - this is an obvious example of a problem that we delegate to computers. Favouring frequent posters creates incentives to do that even if quality worsens.


The perverse thing here is that's exactly the opposite of how we've traditionally valued resources!


because we don't value them at all, literally. It's a tragedy of the commons, internet pollution is like air pollution, the polluters don't pay and there's no cost associated with overusing other people's attention.


I'd be fascinated on the economics of this from Google's perspective: specifically the unit economics on generating updated-once-a-year results to queried-once-in-a-million searches.

Tl;dr: I feel like the long-tail web (90s) was better, but economics pushed high-update-frequency more-centralized results.


What do you do to learn new programming construct? What did you do to learn programming - didn't you write

  #include <stdio.h>

  int main() {

    printf("Hello World");
    return 0;
  }
while having no idea what 'stdio.h' is?


Choosing not to know what stdio.h means is willful ignorance, an LLM has little to do with said chosen ignorance, that is a choice because "hey it works on machine!" and when I pushed it, nobody seemed to mind.

What a time to be alive. Actively choosing to rebuke knowledge because "what the fuck does it matter anyways"


Funny you should mention hello world. Kernighan and Ritchie presented it in TCPL as a little anatomical diagram of close to the smallest possible functional C program with the different parts labelled. The first line is labelled "include information about the standard library". What this means in detail is explained in that chapter. Furthermore, if you were compiling on a Unix system, stdio.h was readily available as /usr/include/stdio.h. Curious people could open it up using more or vi and see what was inside. There was no shortage of curious people back then.

The process of "going through the motions" of writing and compiling a program without even a small understanding of what it all meant was a later innovation, perhaps done as a classroom exercise in an introductory CS course for impatient freshmen or similar.



When you first learn anything you don't really understand it - that takes longer. When you learn woodworking you won't know why you have to hold the saw a certain way. When you learn chemistry you don't know how we know atoms exist. You start by doing the fun stuff and fill in the gaps later.


You can use vibe coding for learning. It is very effective.


no. it was the first question I asked and was given a satisfactory explanation (along the lines of, "this adds things to your program that help it write text to the screen.")


That's not even remotely satisfactory if we're talking about understanding what we're doing


I like it because it is constructive!

I am really surprised with the amount of backlash at this site for using llm helpers in writing. There are many ways in which this can go wrong - and the article lists some of them - but it does not blindly close all llm writing helpers.

What would be even more constructive would be an article listing the good ways of using llms.

https://xkcd.com/810/ :)


I laughed - but I don't want more of this


You don't want more of this on Hacker News?


I don't want more of this on Hacker News. I laughed because it was like ricklolling - but I am kind of curious what was the value so many people found in this article.


Hmm - but has its incidence increased or just other causes have fallen down faster?


I just found the xkcd that expresses my opinion on this:

https://xkcd.com/810/

I am surprised that apparently I am in a minority here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You