For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | dontreact's commentsregister

Hot take:

There should be an anti leaderboard that highlight people under a threshold. Not trying to learn how to use ai while working at a company like Amazon is almost certainly a bad thing, and cause for looking into why.


Lung cancer screening should be used more broadly and improved over time in a data driven fashion!

We can catch things early, it shouldn’t be limited to only for smokers.


My take: multi turn evals are hard because to do it really correctly you have to simulate a user. This is not yet modeled well enough for multi turn to work as well as it could.


It tests people chatting to ChatGPT! That's a pretty big and important use case.


The flip side of this is that for some tasks (especially in ml/ai), doing it manually at least a few times gives you a sense of what is correct and a better sense of detail.

For example, spending the time to label a few examples yourself instead of just blindly sending it out to labeling.

(Not always the case, but another thing to keep in mind besides total time saved and value of learning)


I think the methods here are highly questionable, and appear to be based on self report from a small amount of employees in Denmark 1 year ago.

The overall rate of participation in the labor work force is falling. I expect this trend to continue as AI makes the economy more and more dynamic and sets a higher and higher bar for participation.

Overall GDP is rising while labor participation rate is falling. This clearly points to more productivity with fewer people participating. At this point one of the main factors is clearly technological advancement, and within that I believe if you were to make a survey of CEOS and ask what technological change has allowed them to get more done with fewer people, the resounding consensus would definitely be AI


I think that “I’m not technical” is often an excuse for throwing work at other people and frankly can be a form of learned helplessness. Nowadays, there is less and less reason to ask other people to write one off scripts/queries, you can ask AI for help and learn how to do that.

Since this is HN some disclaimers -no that’s not always what’s happening, when “not technical” is thrown around -no it’s not always appropriate to use AI instead of asking an expert


It may be a good thing to throw scripts off to someone else. Division of labor is a good thing. You cannot possibly learn everything to a good (not even high) standard. Even if you could, no lawyer would have themselves as a client - when a lawyer needs legal advice they go to a different lawyer because they want that different perspective: this is often a good perspective for other subjects as well.

The question is what you will/should learn for your limited time alive. Society needs well educated (I include things "street smarts" and apprenticeship in educated here) people in many different subjects. Some subjects are important enough everyone needs to learn them (reading, writing, arithmetic). Some subjects are nearly useless but fun (tinplate film photography) and so worth knowing.

Things like basic computer skills are raising to the level where the majority of people today need them. However I'm not sure that scripting is itself quite at that level. (though it is important enough that a significant minority should have them)


Looks like I needed another disclaimer:

I’m talking about a general trend I see in use of this term, not that it’s always a bad thing to say “I’m not technical so someone else should write the script”

I agree with everything you said!

Both things are happening in the world: people using this terminology to throw work at others needlessly, and people doing good division of labor.


Is there any evidence R1 is better than O1?

It seems like if they in fact distilled then what we have found is that you can create a worse copy of the model for ~5m dollars in compute by training on its outputs.


Cosine similarity is equal to the dot product of each vector normalized


“In my humble opinion, these companies would not allocate a second of compute to lightweight models if they thought there was a straightforward way to achieve the next leap in reasoning capabilities.”

The rumour/reasoning I’ve heard is that most advances are being made on synthetic data experiments happening after post-training. It’s a lot easier and faster to iterate on these with smaller models.

Eventually a lot of these learnings/setups/synthetic data generation pipelines will be applied to larger models but it’s very unwieldy to experiment with the best approach using the largest model you could possibly train. You just get way fewer experiments per day done.

The models bigger labs are playing with seem to be converging to about what is small enough for a researcher to run an experiment overnight.


> You just get way fewer experiments per day done.

Smaller/simpler/weird/different models can be an incredible advantage due to iteration speed. I think this is the biggest meta problem in AI development. If you can try a large range of hyper parameters, fitness function implementations, etc. in a few hours, you will eventually wipe the floor with the parties forced to wait days, weeks and months for their results each time.

The bitter lesson certainly applies and favors those with a lot of compute and data, but if your algorithms fundamentally suck or are approaching a dead end, none of that compute or information will matter.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You