What would be super cool is if this dumb zone could be quantified and surfaced to the user. I've noticed that copilot now has a little circle graph that indicates context use percentage and it changes color based on percentage. I'll bet these are very naive metrics on used tokens vs context availability. I wonder if there could be meta data streamed or sent along with the tokens that could show that you've entered the dumb zone.
Sadly, we have n=1 for intelligence and that's humans. The "second best" of intelligence is already LLMs. And it's hard to expect imitation learning on data that wasn't produced by anything intelligent to yield intelligence - although there are some curious finds.
Even for human behavior: we don't have that much data. The current datasets don't capture all of human behavior - only the facets of it that can be glimpsed from text, or from video. And video is notoriously hard to use well in LLM training pipelines.
That LLMs can learn so much from so little is quite impressive in itself. Text being this powerful was, at its time, an extremely counterintuitive finding.
Although some of the power of modern LLMs already comes from nonhuman sources. RLVR and RLAIF are major parts of training recipes for frontier labs.
The datasets going into LLMs have to have an element of human-ness to it.
For example I can’t just feed it weather data from the past decade and expect it to understand weather. It needs input and output pairs with the output being human language. So you can feed it weather data but it has to be paired with human description of said data. So if we give it data of a rain storm there has to be an english description paired with it saying it’s a rainstorm.
At this point, I'm not so concerned about the interface (claude code vs github copilot, etc, etc.) Sometimes I need to use one over the other because of...reasons. But I do seem to be coming back to the Anthropic models in particular. My rule of thumb is turning out to be:
1)How long is this taking?
2)Was it the right solution?
The first is pretty easy to get a feel for. The second is also a feeling I'm developing over time, but I am starting to trust the Anthropic models for all my coding.
The article says "Consider the implications if ChatGPT started saying “I don’t know” to even 30% of queries – a conservative estimate based on the paper’s analysis of factual uncertainty in training data. Users accustomed to receiving confident answers to virtually any question would likely abandon such systems rapidly."
Maybe. But not me. I would trust it more, and rely on it even more. I can work with someone who says I don't know but is super smart. And I'll bet more people will do the same. Over time, the system may enjoy the rewards of communal trust over and above what it currently enjoys.
However, over the long time, this may lead to a more dystopian version of what might happen currently. We may all give blind trust because we all trust it. Given a decade or half of that, and then the system going wrong....Yikes.
We have to grapple with the ongoing advice that "ChatGPT can make mistakes. Check important info." And we do. Because we have to, or at least some of us do. And that is a good thing.
reply