For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | mckennameyer's commentsregister

Claude's great at reading what people say, but surprisingly bad at recognizing when a politician's stance is just the first signal in a negotiation.

You're right, just updated.

Original title took one framing from the back half of the post (3 update cycles that can loosely be called the "ChatGPT era, then xAI/Meta/Gemini era, then Anthropic era"), but definitely not the point here. Thanks for flagging


Nice!


So basically the attacker and the dev who caught it were probably using the same tools if the malware was AI-generated (hence the fork bomb bug), and the investigation was AI-assisted (hence the speed). Less "tip of the iceberg" and more just that both sides got faster.


It seems like a marketing play to seize on the protein movement. What will they do when fiber becomes the next craze?


Can definitely relate. I think forcing myself to conduct 1 session at a time feels so difficult not only from an efficiency POV but from an attention standpoint. Waiting for a session to finish, being alone with my thoughts... we're faced every day with things that are convincing us that multitasking is efficient when it's really not at all


Yeah - its definitely a new way of working and getting used to!


For anyone following the Chalamet drama... next you'll have to look into how many times a best actor frontrunner has lost thanks to their ego last week of the race!


Do you think reasoning and behavioral effort should be separate knobs, or is bundling them the right call?


I see the value in making it simple for the user, but here I feel it's a bit too much. Would probably prefer two.


Aren't vibe PRs way more likely to get abandoned? Sure they reduce reviewer load, but then everyone feels less urgency to do a human review after. Do you think the skill is making that better or worse?


yeah I guess figuring out how AI and your team can optimally work together is not that straightforward.. probably every engineering team is trying to figure that out atm :D but if we already let AI write reviews, they should at least be as good as they can


We tested GPT-5 and Gemini Flash 3 at low, medium, and high effort on 169 instances with human-verified answers, scored against a frozen offline web corpus using Deep Research Bench. High effort consistently scored worse than lower thinking levels for both models. Methodology and raw data: https://everyrow.io/docs/notebooks/deep-research-bench-paret... (edited)


Interesting approach with the cascade. How do you decide when to escalate from fuzzy matching to LLM?


So fuzzy matching only makes sense if you expect two columns having the same data more or less, otherwise you can skip that step.

And then you have to pick a threshold -> if similarity of strings is above that threshold, it's a match, otherwise, not. Threshold should be high to prevent false positives. LLM will take care of the non-matches


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You