For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | tomjakubowski's commentsregister

I'm quite fond of vimscript legend Tim Pope's guidance on writing commit messages.

https://tbaggery.com/2008/04/19/a-note-about-git-commit-mess...


You don't need a theory of mind to effectively manage or collaborate with a chatbot. You do for other humans.

generally an editor writes the headline, not the reporter

CVS Pharmacy has started rolling out an "AI assistant" phone tree with no apparent way to get to a human.

Their old school phone tree didn't either. You had to pretend to be irate.

Pretend? Oh, there's no pretending involved.

Maybe if you use a lot of profanities and threaten to cancel your subscription?

From now on I will use the gigacalorie for this kind of thing.

There are a number of brick and mortar retailers I frequent who swing the other way and don't accept cash, only credit or debit. Presumably, they prefer paying the cost of credit card fees to the costs of handling cash. What's driving that difference?

> Sure we aren't evolutionary predisposed to work, but Europeans, North Africans, Asians are genetically

What?


The leader of the study, Julian Nyarko, is Associate Director and Senior Fellow at HAI. I can't say whether that means the study was conducted by HAI, but there is at least a connection to it. https://hai.stanford.edu/people/julian-nyarko

This is often called tacit knowledge. https://en.wikipedia.org/wiki/Tacit_knowledge

My favorite example of this is knowing how to untangle a big pile of cables. There are robots now which can untie a single knotted cable, but I don't think any can do a pile of cables yet. https://www.youtube.com/watch?v=vp-94rsherE


The images you can't see in the chats are the question sheet from here, which was the first fourth grade math homework assignment I tried. https://www.k5learning.com/worksheets/math/data-graphing/gra...

Fourth graders typically don't have access to Python for their homework assignments. To be fair to the kids, I tried it first without Python: Opus 4.6 (Feb 2026) with default Medium effort. https://claude.ai/share/1533a3e4-6757-4614-b95d-0743350a6598

pastebin of the reasoning section (no Python): https://pastebin.com/zZeG5ZnJ

It got questions 2 (Shop D) and 5 (280) wrong. It got question 3 right but the work it showed has the numbers for each shop wrong. My fourth grade teacher would have taken off points for that (shout out Mrs. Van Bladel).

Here it is again with a prompted nudge to use Python: https://claude.ai/share/e1265efb-0988-40ac-90ac-c76225b67e98

pastebin of the reasoning section (with Python): https://pastebin.com/KsP0xxZL

This time it used Python to "check its work", and answered the same questions incorrectly (2 and 5). To the model's credit, it did show the correct work on answer 3 this time.


That's more of a test of vision LLM ability to correctly identify and count things in an image than it is of mathematical reasoning.

If you look at the working of your non-Python example it gets most of the counts wrong - identifying shop A as two full notebooks plus one half notebook when it's actually three full notebooks, for example. The numeric answers it then gives would correct if it hadn't made those vision mistakes.

I've been testing vision LLMs on counting the number of pelicans in a photo for a while, they're very unreliable at that.

The best I've seen is Google Gemini 2.5 if you have it output image segmentation masks (a feature they have not included in the Gemini 3 series yet): https://simonwillison.net/2025/Apr/18/gemini-image-segmentat... - but that requires additional harness engineering, you need to explicitly cause it to use its image segmentation mechanism.


Fourth grade math's† students are learning geometry and how to draw simple plots. Vision ability (or tactile ability, for visually impaired students) is pretty important to understanding and solving those homework problems.

†: think "bo's'n"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You