And by nowadays you mean since ChatGPT got released, that is less than 2 years ago (e.g. a consumer preview of a frontier research project). Interesting.
Okay but how big of a sample size do we even actually need for word frequencies? Like what’s the goal here? It looks like the initial project isn’t even stratified per year/decade
Yea, obviously not, but the smaller problems this bigger project was composed of were things that you could see anywhere. I made heavy use of string manipulation that could be generally applied to basically anything
> HN on LLMs "meh not real reasoning, in fact it's exactly the same thing as ELIZA"
> HN when someone makes a trivial LLM-wrapper (but cynic!! I hate corporate ahah): 9999 comments 9999 points "this is the best thing since sliced bread"
> The young ones don't understand why we oldies roll our eyes at hearing the 13th iteration of "this is ground breaking new idea"
Why not constructively contribute by explaining what could be different for the idea to work this time? Or if you hate innovation (i.e. retrying age old ideas slightly differently - hoping this time it will work), why don't you pick another industry?
Snark aside, replication is a cornerstone of science. If someone doesn't want to be involved in science because they think it's soul-crushing, perhaps academia isn't the right place for them.
The introductory example is quite illuminating. No theory behind, just a random hypothesis tested (by comparing the preferences of 10 fishes from populations separated from as little as 50 meters, what effect size did the authors expect?), with claims of generalisation (climate change ok bad?? Or climate change bad bad??)