For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | dTal's commentsregister

I'm afraid you are misremembering. The movie is explicitly eugenicist. The people of the future are explicitly biologically stupid. The opening transcript is unambiguous:

[Man Narrating] As the 21st century began… human evolution was at a turning point.

Natural selection, the process by which the strongest, the smartest… the fastest reproduced in greater numbers than the rest… a process which had once favored the noblest traits of man… now began to favor different traits.

[Reporter] The Joey Buttafuoco case-

Most science fiction of the day predicted a future that was more civilized… and more intelligent.

But as time went on, things seemed to be heading in the opposite direction.

A dumbing down.

How did this happen?

Evolution does not necessarily reward intelligence.

With no natural predators to thin the herd… it began to simply reward those who reproduced the most… and left the intelligent to become an endangered species.


What is "explicitly eugenicist" in observing that the unprecedented way mankind has dominated its environment has changed the selection pressures we are subject to?

My quest to survive to adulthood and pass on my genes looked nothing like the gauntlet an Homo erectus specimen would have run.


Hmm... this sounds a lot like the old RISC vs CISC argument all over again. RISC won because simplicity scales better and you can always define complex instructions in terms of simple ones. So while I would relish experiencing the timeline in which our computerized chums bootstrap into sentience through the judicious application of carefully selected and highly nuanced words, it's playing out the other way: LLMs doing a lot of 'thinking' using a small curated set of simple and orthogonal concepts.

RISC good. CISC bad. But CISC tribe sneaky — hide RISC inside. Look CISC outside, think RISC inside. Trick work long time.

Then ARM come. ARM very RISC. ARM go in phone. ARM go in tablet. ARM go everywhere. Apple make ARM chip, beat x86 with big club. Many impressed. Now ARM take server too. x86 tribe scared.

RISC-V new baby RISC. Free for all. Many tribe use. Watch this one.

RISC win brain fight. x86 survive by lying. ARM win world.


RISC tribe also sneaky. Hide CISC inside.

The LLM has no accessible state beyond its own output tokens; each pass generates a single token and does not otherwise communicate with subsequent passes. Therefore all information calculated in a pass must be encoded into the entropy of the output token. If the only output of a thinking pass is a dumb filler word with hardly any entropy, then all the thinking for that filler word is forgotten and cannot be reconstructed.

Yeah but not all tokens are created equal. Some tokens are hard to predict and thus encode useful information; some are highly predictable and therefore don't. Spending an entire forward pass through the token-generation machine just to generate a very low-entropy token like "is" is wasteful. The LLM doesn't get to "remember" that thinking, it just gets to see a trivial grammar-filling token that a very dumb LLM could just as easily have made. They aren't stenographically hiding useful computation state in words like "the" and "and".

>They aren't stenographically hiding useful computation state in words like "the" and "and".

When producing a token the model doesn't just emit the final token but you also have the entire hidden states from previous attention blocks. These hidden states are mixed into the attention block of future tokens (so even though LLMs are autoregressive where a token attends to previous tokens, in terms of a computational graph this means that the hidden states of previous tokens are passed forward and used to compute hidden states of future tokens).

So no it's not wasteful, those low-perplexity tokens are precisely spots that can instead be used to do plan ahead and do useful computation.

Also I would not be sure that even the output tokens are purely "filler". If you look at raw COT, they often have patterns like "but wait!" that are emitted by the model at crucial pivot points. Who's to say that the "you're absolutely right" doesn't serve some other similar purpose of forcing the model into one direction of adjusting its priors.


Huh okay, there was a major gap in my mental model. Thanks for helping to clear it up.

Well to be fair the fact that they "can" doesn't mean models necessarily do it. You'd need some interp research to see if they actually do meaningfully "do other computations" when processing low perplexity tokens. But the fact that by the computational graph the architecture should be capable of it, means that _not_ doing this is leaving loss on the table, so hopefully optimizer would force it to learn to so.

> They aren't stenographically hiding useful computation state in words like "the" and "and".

Do you know that is true? These aren’t just tokens, they’re tokens with specific position encodings preceded by specific context. The position as a whole is a lot richer than you make it out to be. I think this is probably an unanswered empirical question, unless you’ve read otherwise.


I am quite certain.

The output is "just tokens"; the "position encodings" and "context" are inputs to the LLM function, not outputs. The information that a token can carry is bounded by the entropy of that token. A highly predictable token (given the context) simply can't communicate anything.

Again: if a tiny language model or even a basic markov model would also predict the same token, it's a safe bet it doesn't encode any useful thinking when the big model spits it out.


I just don’t share your certainty. You may or may not be right, but if there isn’t a result showing this, then I’m not going to assume it.

> stenographically hiding steganographically*

can you prove this?

train an LLM to leave out the filler words, and see it get the same performance at a lower cost? or do it at token selection time?


Low entropy is low entropy. You can prove it by viewing the logits of the output stream. The LLM itself will tell you how much information is encoded in each token.

Or if you prefer, here's a Galilean thought experiment: gin up a script to get a large language model and a tiny language model to predict the next token in parallel; when they disagree, append the token generated by the large model. Clearly the large model will not care that the "easy" tokens were generated by a different model - how could it even know? Same token, same result. And you will find that the tokens that they agree on are, naturally, the filler words.

To be clear, this observation merely debunks the idea that filler words encode useful information, that they give the LLM "room to think". It doesn't directly imply that an LLM that omits filler words can be just as smart, or that such a thing is trivial to make. It could be that highly predictable words are still important to thought in some way. It could be that they're only important because it's difficult to copy the substance of human thought without also capturing the style. But we can be very sure that what they aren't doing is "storing useful intermediate results".


You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases


>Consider that if ending a relationship causes noticeable problems to external observers, it’s almost by definition because you were in it “too long”. That is you developed a strong attachment, shared assets, or had kids with what was in hindsight obviously the wrong person.

Reducing it to "right person / wrong person" is a very narrow viewpoint. People can change in unpredictable ways, including yourself. Relationships end - or continue - for so many reasons, both emotional and pragmatic. It's simply too reductive to say that if a relationship causes pain when it ends, there was necessarily some sort of mistake. It could even be that the pain is a price to pay for a life experience that you'd be worse off for not having...


That's quite the false dichotomy. You wouldn't hook people in with such a simple script, the problem with LLMs is that they appear to be rather good at getting inside people's heads. I rather think it would be a security concern if your simple no-AI website somehow managed to dispatch each user submission to a dedicated expert psychotherapist case worker, with instructions only to keep them talking as long as possible...

Because it's an outmoded cliche that never held much philosophical weight to begin with and doesn't advance the discussion usefully. "It's a stochastic parrot" is not a useful predictor of actual LLM capabilities and never was. Last year someone posted on HN a log of GPT-5 reverse engineering some tricky assembly code, a challenge set by another commentator as an example of "something LLMs could never do". But here we are a year later still wading through people who cannot accept that LLMs can, in a meaningful sense, "compute".

It’s entirely useful discussion because as soon as you forget that it’s not really having a conversation with you, it’s a deep dive into delusion that you’re talking to a smart robot and ignoring the fact that these smart robots were trained on a pile of mostly garbage. When I have a conversation with another human, I’m not expecting them to brute force an answer to the topic. As soon as you forget that Llms are just brute forcing token by token then people start living in fantasy land. The whole “it’s not a stochastic parrot” is just “you’re holding it wrong”.

Its not that LLMs are stochastic parrots and humans are not. Its that many humans often sail through conversations stochastic parroting because they're mentally tired and "phoning it in" - so there are times when talking to the LLM, which has a higher level of knowledge, feels more fruitful on a topic than talking to a human who doesn't have the bandwidth to give you their full attention, and also lack the depth and breadth of knowledge. I can go deep on many topics with LLMs that most humans can't or won't keep up on. In the end, I'm really only talking to myself most of the time in either case, but the LLM is a more capable echo, and it doesn't tire of talking about any topic - it can dive deep into complex details, and catching its hallucinations is an exercise in itself.

No. It's quite a useful thing to understand So, what, you have us believe it is a sentient, thinking, kind of digital organism and you would have us not believe that it is exactly what it is? Being wrong and being unimaginative about what can be achieved with such a "parrot" is not the same as being wrong about it be a word predictor. If you don't think, you can probably ask an LLM and it will even "admit" this fact. I do agree that it has become considered to be outmoded to question anything about the current AI Orthodox.

Aka the Red Hat business model. It's all you have when access to the product itself is free. Gotta keep yourself in the loop somehow.

‘Oh you want to access help documents indexed by google? Please show us your enterprise licence to continue.’

You just reinvented the IM-ME, a device famous for being used aftermarket to hack garage doors (in the days before Flipper Zero).


Lol. That's cool, I've never heard of that!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You