Original title took one framing from the back half of the post (3 update cycles that can loosely be called the "ChatGPT era, then xAI/Meta/Gemini era, then Anthropic era"), but definitely not the point here. Thanks for flagging
So basically the attacker and the dev who caught it were probably using the same tools if the malware was AI-generated (hence the fork bomb bug), and the investigation was AI-assisted (hence the speed). Less "tip of the iceberg" and more just that both sides got faster.
Can definitely relate. I think forcing myself to conduct 1 session at a time feels so difficult not only from an efficiency POV but from an attention standpoint. Waiting for a session to finish, being alone with my thoughts... we're faced every day with things that are convincing us that multitasking is efficient when it's really not at all
For anyone following the Chalamet drama... next you'll have to look into how many times a best actor frontrunner has lost thanks to their ego last week of the race!
Aren't vibe PRs way more likely to get abandoned? Sure they reduce reviewer load, but then everyone feels less urgency to do a human review after. Do you think the skill is making that better or worse?
yeah I guess figuring out how AI and your team can optimally work together is not that straightforward.. probably every engineering team is trying to figure that out atm :D but if we already let AI write reviews, they should at least be as good as they can
We tested GPT-5 and Gemini Flash 3 at low, medium, and high effort on 169 instances with human-verified answers, scored against a frozen offline web corpus using Deep Research Bench. High effort consistently scored worse than lower thinking levels for both models. Methodology and raw data: https://everyrow.io/docs/notebooks/deep-research-bench-paret... (edited)
So fuzzy matching only makes sense if you expect two columns having the same data more or less, otherwise you can skip that step.
And then you have to pick a threshold -> if similarity of strings is above that threshold, it's a match, otherwise, not. Threshold should be high to prevent false positives. LLM will take care of the non-matches
reply