notracks's comments

notracks · 2026-03-28T16:36:07 1774715767

I recently found out that Claude's latest model, Sonnet 4.6, scores the highest in Bullsh*tBench[0] (Funny name - I know). It's a recent benchmark that measures whether an LLM refuses nonsense or pushes back on bad choices so Claude has definitely gotten better.

[0] - https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

astrange · 2026-03-28T17:20:40 1774718440

I haven't tried talking to Sonnet much, but Opus 4.6 is very sycophantic. Not in the sense of explicitly always agreeing with you, but its answers strictly conform to the worldview in your questions and don't go outside it or disagree with it.

It _does_ love to explicitly agree with anything it finds in web search though.

(Anthropic tries to fight this by adding a hidden prompt that makes it disagree with you and tell you to go to bed, which doesn't help.)

sidrag22 · 2026-03-28T23:31:20 1774740680

the go to bed thing gets annoying, you can't even hint that you are almost done or wrapping up or something or this is hyper triggered and it never stops.

I do like when opus is incredibly short in its responses to prompts that probably shouldnt have been made though. keeps me grounded a bit.

layer8 · 2026-03-28T17:00:05 1774717205

You don’t have to star out things like that on HN.

thin_carapace · 2026-03-28T23:07:14 1774739234

it would be interesting to me if you could explain the motivation behind posting your comment. from my perspective, if somebody with 5 years of forum tenure had the intelligence to comment about advanced benchmarks, they probably noticed that censorship was a voluntary decision here, and had made a personal decision on that front.

mkl · 2026-03-28T23:52:21 1774741941

I'm not layer8, but I had a similar thought. In this case the needless censoring is problematic because it hides the name of the benchmark from future searches (the uncensored URL spells it differently).

layer8 · 2026-03-29T12:50:13 1774788613

Such self-censoring is often done out of habit or a mistakenly assumed obligation to do so. I consider it inappropriate here, as it obscures an actual name, doesn’t constitute an expletive, and the HN readership is generally mature enough to recognize that. The counterquestion is, what justified reason could there possibly be to censor it here? I don’t think there is any, in the sense that people wouldn’t take any offense at the uncensored version, and the intent of my comment was to inform about that.

notracks · 2026-03-29T13:07:33 1774789653

I censored it out of habit of commenting on other platforms and, I actually didn't have any idea about whether you should censor such words or not in here. Will keep that in mind when commenting here next time.

thin_carapace · 2026-03-31T06:35:25 1774938925

some people think cursing is bad when done senselessly

uniq7 · 2026-03-28T18:25:42 1774722342

Good call on censoring yourself preemptively, otherwise HN could demonetize your comment

akurilin · 2026-03-28T17:21:00 1774718460

Great link, thanks for sharing. Confirmed what I saw empirically by comparing the different models during daily use.

notracks · 2026-02-24T14:25:08 1771943108

OVH also increased their pricing significantly.

notracks · 2025-11-05T12:04:33 1762344273

In Sri Lanka, almost every bus has some kind of melody built into it. They might be shorter, but some have various melodies.

https://www.youtube.com/results?search_query=sri+lankan+bus+...

notracks · on Jan 23, 2025

I heard that Daisyui v5 is ditching Tailwind so now it's just pure CSS.

amai · on Jan 23, 2025

The Daisy UI 5 beta release is still based on Tailwind 4

"First Install Tailwind CSS 4 beta"

https://v5.daisyui.com/docs/v5-beta/

notracks · on Nov 6, 2024

That's an interesting website!

notracks · on Oct 29, 2024

I also noticed that after submitting this, I think HN should warn the user somehow if they're trying to submit an existing submission.

dang · on Oct 29, 2024

It does. But after a while, reposts are allowed through. That's on purpose, because want good articles to have multiple chances at getting attention.

The reason for posting lists of previous discussions is not to boo reposts—I hope that's clear! It's to point curious readers in the direction of additional interesting comments.

notracks · on May 15, 2022

useful

kira272921 · on May 15, 2022

thanks :D

HN For You