more sdpmas's comments

sdpmas · 2026-03-04T19:31:13 1772652673

yes, good point. right now, it's somewhat hard to overfit because the meta-optimization extracts tiny bits of information. but over time, we will switch the validation set to some other random subset of the FineWeb or even entirely OOD datasets!

sdpmas · 2026-03-04T18:59:50 1772650790

hey, it's Samip (behind the Slowrun repo). yeah that's a fair point, we will mention them in the blog. but there are a couple of major differences: 1. our emphasis is on using more compute to get better data efficiency. this is important because there are lots of hacky chances that will get lower loss, but when compared to general methods that leverage a lot of compute, they don't do so well. and you can already see how this emphasis on compute leads to different methods to BabyLM! 2. our reasoning behind the repo is not anything to do with how much data a child sees. and our dataset is not tailored towards that either. it's simple pretraining on random subset of the internet. we know there are better training algorithms that get lower loss on that data, and we are finding those.

Mumps · 2026-03-05T14:54:06 1772722446

I feel like you really need to mention BabyLM. For example you have:

> Directions we think are wide open ... Curriculum learning

BabyLM and offshoot published a pretty convincing body of work on exactly that (which suggests it's not particularly relevant to LM training).

As I read your page, I really felt like the brevity-thoroughness tradeoff went the wrong way.

soraki_soladead · 2026-03-04T19:04:20 1772651060

also, BabyLM is more of a conference track / workshop than an open-repo competition which creates a different vibe

HN For You