This is partly why this talk about AI "solving science" should be taken with a grain of salt. Here the authors intentionally poisoned the publication record, but there are millions of papers out there that are also garbage, and it would be very hard for either a human or a LLM to distinguish them from actual work.
I agree with the general insight here. Python is great for humans but once they are out of the loop it's no longer as useful. Having a compiler is more useful for LLMs indeed.
However we are moving one step closer to complete inability for humans to understand the code, as there are likely 100x more developers with experience in Python than Rust. If humans are indeed going to be the bottleneck then perhaps this is inevitable, and languages fitted especially for LLMs will dominate.
I actually believe we need to rethink Git for modern needs. Saving prompts and sessions alongside commits could become the norm for example, or I could imagine having different flags for whether a contribution was created by a human or not.
This doesn't seem to be the direction these guys are going though, it looks like they think Git should be more social or something.
Actually, it is. We're currently leading a conversation among several players in this space to agree on a metadata standard that helps make attaching, collaborating on and transmitting information like this simple, extensible and scalable.
Keep an eye on our blog to see how we're doing this, and how we're doing it in a way that hopefully the entire community joins us in a way where we're not all reinventing the same wheels.
What do people expect to do with these saved prompts/contexts? Nobody is going to read through them, right? I suppose the thinking is LLMs will, but any decently active codebase will soon contain far too much context for any current LLM. Is this the same thinking behind cryonics, ie. we may be able to use this stuff one day so let's start saving it now? Hoarding has ruined many people and it will ruin us all if we're not careful...
For me the reason would be to preserve traces of intentionality (ie what was the user trying to achieve with this commit?). These days a 10k LOC commit might be triggered by a 100-word user prompt, there is a lot more signal in reading the prompt itself than the code changes.
I mean, it's just text, so it shouldn't be too taxing to store it. I agree it's hoarder mentality though :)
>Saving prompts and sessions alongside commits could become the norm for example, or I could imagine having different flags for whether a contribution was created by a human or not.
and then the tooling could attach any metadata to it that is desired.
OH WAIT YOU CAN DO THAT ALREADY SINCE 2009
Seriously, the 90% complaints about git not being able to do something is just either RTFM or "well, it can, but could use some better porcelain to present to user"
It's crazy to me that this is not considered fraud. You sign up for a yearly plan under a given assumption of functionality, then they just change the terms to give you less than what they agreed to without compensating you in any way. That's textbook fraud.
I wonder whether this was your first attempt to solve this issue with LLMs, and this was the time you finally felt they were good enough for the job. Did you try doing this switch earlier on, for example last year when Claude Code was released?
Honestly, I was very adverse to agentic code up until Opus came out. The hallucinations and false confidence it had in objectively wrong answers just broke more things than it fixed.
However after it came out it suddenly behaved closely to what they marketed it as being. So it was my first real end-to-end project relying on AI at the front seat. Though design wise it is nowhere near perfect, I was holding it's hand the entire way throughout.
Fascinating stuff. Any chance of using a sparse autoencoder or some other method to try to grasp what the model is actually doing in those middle layers? It would be quite cool to get a better sense of what type of input it is getting in the first time it goes through the reasoning circuit compared to the second or third time.
reply