For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | fenomas's commentsregister

Curmudgeonly comment from someone trying to sound like a wise elder about how actually all this was the norm even in the days of Usenet.

What you're asking for is exactly what's in the link you replied about. It collects analysis of each solution (or attempt), and info about whether the AI's solution could be found anywhere in the literature.


Where?


The high-order bit for for each case is the category it's in and the "Outcome" column - that summarizes if the solution was full/partial/wrong, if AI had assistance, etc. Then further discussion for each one is linked from the number.

Then the "Literature result" columns have a citations for where similar published results were found. The ones with no "Literature" column, like in the first section, are cases where no similar published results have been found (implying that the solution would not have been trained on). Note that in some cases a published solution was found but it wasn't similar to the AI's.

(this is all explained with more detail and caveats at the top of the page)


Sorry, I suppose I'm asking for a lot of handholding here, which isn't really fair. I'm actually just sick right now and have crazy brain fog. Thanks for the assistance! I'll read through.

FWIW I've wavered on this topic quite a bit. Not too long ago I leaned more heavily towards "complex cognitive capabilities can be expressed using statistical token generation", I've started leaning the other way, but I'm not committed so it's great to circle back on the state of things.


Not at all - didn't mean to sound snarky, I just wanted to add that I was omitting details and caveats.

FWIW, personally I think it muddies things to frame the question as if "..using statistical token generation" was a limitation. NNs are Turing-complete, so what LLMs do can just be considered "computation" - the fact that they compute via statistical token generation is an implementation detail.

And if you're like most people, "can cognition happen via computation?" is a less controversial question, which then puts LLMs/cognition topics easily into the "in principle, obviously, but we can debate whether it's achievable or how to measure it" category.


The post you replied to was:

> We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

All that says is that the speaker thinks models will improve past where they are today. Not that it's a logical certainty (the first thing you jumped on them for), and certainly not anything about "limitless potential for growth" (which nobody even mentioned). With replies like this, invoking fallacies and attacking claims nobody made, you're adding a lot of heat and very little light here (and a few other threads on the page).


> All that says is that the speaker thinks models will improve past where they are today. Not that it's a logical certainty

Exceedingly generous interpretation in my opinion. I tend to interpret rhetorical questions of that form as “it’s so obvious that I shouldn’t even have to ask it”.


> generous interpretation

The term of art for that is steelmanning, and HN tries to foster a culture of it. Please check the guidelines link in the footer and ctrl+f "strongest".


Better put than I could have.


It's not a side effect of tokenization per se, but of the tokenizers people use in actual practice. If somebody really wanted an LLM that can flawlessly count letters in words, they could train one with a naive tokenizer (like just ascii characters). But the resulting model would be very bad (for its size) at language or reasoning tasks.

Basically it's an engineering tradeoff. There is more demand for LLMs that can solve open math problems, but can't count the Rs in strawberry, than there is for models that can count letters but are bad at everything else.


LLMs are bad at arithmetic and counting by design. It's an intentional tradeoff that makes them better at language and reasoning tasks.

If anybody really wanted a model that could multiply and count letters in words, they could just train one with a tokenizer and training data suited to those tasks. And the model would then be able to count letters, but it would be bad at things like translation and programming - the stuff people actually use LLMs for. So, people train with a tokenizer and training data suited to those tasks, hence LLMs are good at language and bad at arithmetic,


This is like saying chess engines don't actually "play" chess, even though they trounce grandmasters. It's a meaningless distinction, about words (think, reason, ..) that have no firm definitions.


This exactly. The proof is in the pudding. If AI pudding is as good as (or better than) human pudding, and you continue to complain about it anyway... You're just being biased and unreasonable.

And by the way, I don't think it's surprising that so many people are being unreasonable on this issue, there is a lot at stake and it's implications are transformative.


Chess engines are not a comparable thing. Chess is a solved game. There is always a mathematically perfect move.


> Chess is a solved game. There is always a mathematically perfect move.

This is a good example of being confidently misinformed.

The best move is always a result of calculation. And the calculation can always go deeper or run on a stronger engine.


We know that chess can be solved, in theory. It absolutely isn't and probably will never be in practice. The necessary time and storage space doesn't exist.


Chess is absolutely not a solved game, outside of very limited situations like endgames. Just because a best move exists does not mean we (or even an engine) know what it is


Those are emoticons ;)

Emoji originally came from Docomo phones in Japan around 1999. (Or I think those were the first ones actually called "emoji"; some other earlier devices had similar character sets.)


They would have if they could have - answered here:

https://news.ycombinator.com/item?id=47256093


I remember using that tool internally! Personally I think I only used it to get stats of which features/APIs were popular. But I think other teams used it for QA/conformance, like finding content that occurred in the wild but wasn't covered by test cases.


Hahaha. Always cool to find users of the tools/products you build, including the obscure ones, and on HN no less :))


If it was possible they would have loved to - certainly by 2012 or so, and more likely by 2008-9. The reason I heard they couldn't is that by that time Flash Player was a massive 10+ year old codebase with lots of parts that were licensed or external, and nobody had ever tracked which parts would be to be relicensed or rewritten.

Source: I worked there at the time and knew the relevant PMs.


That doesn't surprise me, honestly.

I wish they had been able to figure out how to do it, but licensing and patents and whatnot have held back lots of innovation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You