The hardest part about using agents to code for me has always been working in teams. When you can cut through huge parts of the code with a chainsaw, how do you review multi-thousand line PRs?
It's really hard to do surgical changes with an AI agent, and it's even harder to review those changes. Even if I'm reviewing the specs and the code, the cognitive load on reviews feel like they've ballooned from what used to be a few hours to now taking me days to review these PRs.
The problem with models like this is they're built on very little actual training data we can trace back to verifiable protein data. The protein data back, and other sources of training data for stuff like this, has a lot of broken structures in them and "creative liberties" taken to infer a structure from instrument data. It's a very complex process that leaves a lot for interpretation.
On top of that, we don't have a clear understanding on how certain positions (conformations) of a structure affect underlying biological mechanisms.
Yes, these models can predict surprisingly accurate structures and sequences. Do we know if these outputs are biologically useful? Not quite.
This technology is amazing, don't get me wrong, but to the average person they might see this and wonder why we can't go full futurism and solve every pathology with models like these.
We've come a long way, but there's still a very very long way to go.
This is awesome! The only limiter here is the resolution, I think this is fantastic for cellular level organelles but it doesn't quite get down to the same resolution something like x-ray diffraction does.
There's a huge trade off between resolution and scale that makes it hard to determine things like complex molecular dynamics and how those dynamics influence the broader functions of the cell.
That said, excited for more images like this! More data at that scale is always a good thing for researchers.
Indeed, the amazing images that we've all seen of the coronavirus are from the same technique, cryo-EM tomography, but the overall size of the specimen is also much smaller. There's a limit to how much data can be processed, resulting in a scale-resolution tradeoff.
Now my info might be outdated since it was a few years ago, but I was once told that when you use one of those microscopes, you bring with you a terabyte hard drive for each specimen.
Honestly, this is a good thing. OpenClaw as a concept was rather silly to run such a heavy model for. If you want something like OpenClaw to work you really need to figure out how to do it with an economical model.
I'm not convinced people who are doing real work on production applications with any sizable user base is writing code through only agents. There's no way to get acceptable code from these models without really knowing your code base well and basically doing all the systems thinking for the model.
Your workflow is probably closer to what most SWEs are actually doing.
You really need to keep them on a tight leash, stop and correct them when they start screwing up, and then the remaining 90% of the work starts after they say their done, where you need to review/refactor/replace a lot of what they produced.
The only way you're going to let an agent go off on its own to one-shot a patch is if your quality bar is merely "the code works."
Not true. As long as you don't blindly accept their garbage and keep things behind sensible interfaces so you can reimplement if necessary, and have good tests you're fine
This, at least for me, has changed in the past six months. Which is the same thing people were saying in the months prior to that, so I will accept some eye rolls. But at least for our pretty large monorepo opus + a lot of engineering work on context got us to a point where a large portion of our engineers are doing most of their work with agents first and a lot of back and forth + smaller hand edits.
reply