More

NitpickLawyer · 2026-04-03T20:52:27 1775249547

> This is all CG.

Reminds me of the classic - It is true that Spielberg filmed the moon landings, but he was such a perfectionist that he wanted to shoot on location.

TeMPOraL · 2026-04-04T06:24:08 1775283848

And here I thought it was all shoot on soundstage on Mars.

dylan604 · 2026-04-03T21:04:23 1775250263

ahem, Kubrik

saint_yossarian · 2026-04-03T23:14:44 1775258084

Kubrick, even.

NitpickLawyer · 2026-04-03T14:35:48 1775226948

Clickbait/ragebait title. They are insisting on testing these things without a harness.

> In practise this philosophy translates into using extremely generic and miminal LLM testing prompts, no client-side "harnesses", no hand-crafted tools, and no tailored model configuration.

Yeah, that's not gonna work, chief. The agentic stuff works today because it's in a loop and because it has some feedback from that loop.

As it stands now, the SotA models are 0.2% (Gemini 3.1 and Opus 4.6) and 0.3% GPT5.4-High. By only hitting their APIs, with no client-side orchestration, they're only testing internal tools (if any). This is not the way. (but grabs headlines)

FWIW, the kaggle competition is already at 0.5% with 2x T4 GPUs max and a few hours for ~110 problems so... Yeah. (and it's only been live for <1week)

NitpickLawyer · 2026-04-03T10:36:01 1775212561

> they are leading actually-open AI.

How are they leading? If I parse this correctly, "actually" open would mean fully open data training and weights? Then, by this definition, I'm only aware of Olmo (AllenAI - Seattle), Apertus (Swiss) and to some degree (unclear what data was actually published) Nemotron (Nvda, US). What are some examples of chinese similar models? (I'm not aware of any).

NitpickLawyer · 2026-04-03T05:20:12 1775193612

> At least they are not travelling near the speed of light. That's a whole different can of worms.

Oh, they're building software and hardware for this anyway. The differences between earth clocks and moon clocks (CotS) would lead to large errors if you were to calculate distances. There was an article about this a while ago. Fascinating stuff.

NitpickLawyer · 2026-04-02T19:06:55 1775156815

> I think everyone forgot about early SpaceX product quality.

This was 8 years ago and is one of the greatest stuff I've seen in space launches. The footage is so epic that it even got replicated in SciFi series! ... https://youtu.be/wbSwFU6tY1c?t=1313

This was 9 years ago, first droneship landing - https://youtu.be/7pUAydjne5M?t=1642

And this is 18 years ago, their first Falcon1 launch - https://www.youtube.com/watch?v=bET0mRnqxQM

More live video from the ascent than we got on Artemis2 for sure...

ceejayoz · 2026-04-03T16:29:05 1775233745

So this illustrates my point quite well.

The F1 launch doesn't follow the craft and freezes as it launches. No telemetry, fancy overlays, practiced presenters, etc.

The first droneship landing and first FH flight are both long into SpaceX's evolution of how they do these videos. Today's are even slicker.

Practice has made these improve dramatically, and today's SpaceX demos blow each of your examples out of the water. That's what doing a live cast every few weeks gets you.

NitpickLawyer · 2026-04-02T17:40:14 1775151614

Jeff Dean apparently didn't get the message that you weren't releasing the 124B Moe :D

Was it too good or not good enough? (blink twice if you can't answer lol)

NitpickLawyer · 2026-04-02T16:25:10 1775147110

Best thing is that this is Apache 2.0 (edit: and they have base models available. Gemma3 was good for finetuning)

The sizes are E2B and E4B (following gemma3n arch, with focus on mobile) and 26BA4 MoE and 31B dense. The mobile ones have audio in (so I can see some local privacy focused translation apps) and the 31B seems to be strong in agentic stuff. 26BA4 stands somewhere in between, similar VRAM footprint, but much faster inference.

NitpickLawyer · 2026-04-02T14:26:21 1775139981

> in terms of security

I wouldn't go that far. As soon as you went online all bets were off.

In the 90s we had java applets, then flash, browsers would open local html files and read/write from c:, people were used to exchanging .exe files all the time and they'd open them without scrutiny (or warnings) and so on. It was not a good time for security.

Then dial-up was so finicky that you could literally disconnect someone by sending them a ping packet. Then came winXP, and blaster and its variants and all hell broke loose. Pre SP2 you could install a fresh version of XP and have it pwned inside 10 minutes if it was connected to a network.

Servers weren't any better, ssh exploits were all over the place (even The Matrix featured a real ssh exploit) and so on...

The only difference was that "the scene" was more about the thrill, the boasting, and learning and less about making a buck out of it. You'd see "x was here" or "owned by xxx" in page "defaces", instead of encrypting everything and asking for a reward.

NitpickLawyer · 2026-04-02T09:57:46 1775123866

Used to be worse. Something happened in the last year and I'm seeing way way less random captchas for regular use from a residential IP. In '22-'24 it used to be extremely common, now it's an event when it happens. Also went from mint to plain ubuntu so that might have something to do with it?

Aerroon · 2026-04-02T10:39:39 1775126379

It's a good thing too, because when I see the Cloudflare captcha I try it once and if that doesn't work then I just close the tab and add it to the list of non-functioning websites.

Cloudflare captcha = infinite loop of captchas (if it doesn't work on the first try). You can give up the moment that happens, because you will never get to the website itself.

NitpickLawyer · 2026-04-02T06:49:00 1775112540

Bit of a fluff piece with a weird title. Yes, GMs use "suboptimal moves" in their games, but the main reason is to take their opponents out of prep, and more importantly those lines are also heavily analysed by engines. They are specifically looking for imprecise moves that are only imprecise if the opponent finds the correct line, which could be 10-15 moves deep (so it might not be feasible to do over the board).

And this isn't something new. Magnus has been doing this for a few years now, after getting bored of facing the same over prepped opponents. He has mastered this technique, and showed that he's still the GOAT at mid to late game positions once the opponent is out of prep. But again, he's not doing this "randomly", he's studying when and where he can do it to temporarily get a disadvantage that will sort itself out later in the game. And engines are heavily used still.

raincole · 2026-04-02T08:31:18 1775118678

A valuable lesson AI taught me is how bad articles on Bloomberg and Forbes are. They probably have always been this bad, but I were unaware of that until they started writing about AI (because, admittedly, I subconsciously thought well-known = good).

IanCal · 2026-04-02T08:43:55 1775119435

There’s something called the Gell-Mann amnesia effect where people often see what you have but then go back to assuming the other stories are all reliable.

I used to love Private Eye and they have done great journalism that’s highly acclaimed, but the only thing they wrote that I really knew about (literally the office I was in) was outrageously wrong and would have been so easy to verify (ask literally anyone in the BBC building we were in to go to that floor, or take a tour or write an email). Can’t read it any more.

SyneRyder · 2026-04-02T09:35:28 1775122528

Here's Wikipedia's entry on the Gell-Mann Amnesia Effect, because I've found it a very useful concept to know. Despite my media experiences, I still keep falling for it. And I love that we're still referring to it as Gell-Mann Amnesia here:

https://en.wikipedia.org/wiki/Michael_Crichton#Gell-Mann_amn...

In a speech in 2002, Crichton coined the term "Gell-Mann amnesia effect" to describe the phenomenon of experts reading articles within their fields of expertise and finding them to be error-ridden and full of misunderstanding, but seemingly forgetting those experiences when reading articles in the same publications written on topics outside of their fields of expertise, which they believe to be credible. He explained that he had chosen the name ironically, because he had once discussed the effect with physicist Murray Gell-Mann, "and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have".

stavros · 2026-04-02T10:34:28 1775126068

> "and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have".

Ahh, yes, the SyneRyder effect.

zwischenzug · 2026-04-02T10:21:39 1775125299

Everything I've known anything about first hand has been utterly garbled - or was completely made up - when written up in Private Eye.

cwillu · 2026-04-02T13:53:50 1775138030

Odd take, as this was actually a pretty good article. The GP appears to be mostly bemoaning the fact that it's targeted at a lay audience.

qsort · 2026-04-02T09:46:46 1775123206

The article says exactly that:

> As much as chess players can prepare, they can’t memorize everything. When they’re sitting at the board, their computers slumbering at home, they will inevitably be defined by the limits of their knowledge and ability. As a result, the elite grandmasters have realized the most valuable move is often the one that forces their opponents to start thinking with their brains rather than their engines, even if it might not be the “best” possible move.

I agree it's not exactly breaking new ground, but it's an okay article for a generalist audience.

HN For You