There are a number of brick and mortar retailers I frequent who swing the other way and don't accept cash, only credit or debit. Presumably, they prefer paying the cost of credit card fees to the costs of handling cash. What's driving that difference?
The leader of the study, Julian Nyarko, is Associate Director and Senior Fellow at HAI. I can't say whether that means the study was conducted by HAI, but there is at least a connection to it. https://hai.stanford.edu/people/julian-nyarko
My favorite example of this is knowing how to untangle a big pile of cables. There are robots now which can untie a single knotted cable, but I don't think any can do a pile of cables yet. https://www.youtube.com/watch?v=vp-94rsherE
Fourth graders typically don't have access to Python for their homework assignments. To be fair to the kids, I tried it first without Python: Opus 4.6 (Feb 2026) with default Medium effort. https://claude.ai/share/1533a3e4-6757-4614-b95d-0743350a6598
It got questions 2 (Shop D) and 5 (280) wrong. It got question 3 right but the work it showed has the numbers for each shop wrong. My fourth grade teacher would have taken off points for that (shout out Mrs. Van Bladel).
This time it used Python to "check its work", and answered the same questions incorrectly (2 and 5). To the model's credit, it did show the correct work on answer 3 this time.
That's more of a test of vision LLM ability to correctly identify and count things in an image than it is of mathematical reasoning.
If you look at the working of your non-Python example it gets most of the counts wrong - identifying shop A as two full notebooks plus one half notebook when it's actually three full notebooks, for example. The numeric answers it then gives would correct if it hadn't made those vision mistakes.
I've been testing vision LLMs on counting the number of pelicans in a photo for a while, they're very unreliable at that.
The best I've seen is Google Gemini 2.5 if you have it output image segmentation masks (a feature they have not included in the Gemini 3 series yet): https://simonwillison.net/2025/Apr/18/gemini-image-segmentat... - but that requires additional harness engineering, you need to explicitly cause it to use its image segmentation mechanism.
Fourth grade math's† students are learning geometry and how to draw simple plots. Vision ability (or tactile ability, for visually impaired students) is pretty important to understanding and solving those homework problems.
https://tbaggery.com/2008/04/19/a-note-about-git-commit-mess...
reply