As a AI-aware software engineer currently creating systems that integrate with LLM provider APIs for my company- who also has no idea what an eval is or how a data scientist thinks about RAG. I honestly don't see what value a data scientist would bring to the table for my team. Maybe someone would care to enlighten me?
Your view of what is happening in the neural net of an LLM is too simplistic. They likely aren't subject to any constraints that humans aren't also in the regard you are describing. What I do know to be true is that they have internalised mechanisms for non-verbalised reasoning. I see proof of this every day when I use the frontier models at work.
There must be a mechanism to rate the person submitting the PR. Anyone that wants to submit code to a well-known repo would first need to build a demonstrable history of making high-quality contributions to lesser known projects. I'm not very familiar with the open source scene but I'd find it very surprising if such a mechanism was not already in place. Seems like an obvious solution to the problem of vibe coders submitting slop.
> build a demonstrable history of making high-quality contributions to lesser known projects.
> Seems like an obvious solution
I'm not sure how you would rank quality of submissions for grading contributors like this. Just because a project accepted your PR doesnt make it high quality, the best we can hope for is that it was better than no accepting it?
I use the restore checkpoint/fork conversation feature in GitHub Copilot heavily because of this. Most of the time it's better to just rewind than to salvage something that's gone off track.
Any given system will still need people around to steer the AI and ensure the thing gets built and maintained responsibly. I'm working on a small team of in-house devs at a financial company, and not worried about my future at all. As an IC I'm providing more value than ever, and the backlog of potential projects is still basically endless- why would anyone want to fire me?
Why would it need people to steer the AI? I can easily see a future where companies that don't rely on the physical world (like manufacturing) are completely autonomous, just machines making money for their owner.
It's easy to imagine but there's still a vast amount of innovation and development that has to happen before something like that becomes realistic. At that point the whole system of capitalism would need to be reconsidered. Not going to happen in the foreseeable future.
The difference between having a non-technical person and someone who is capable of understanding the code being generated and the systems running it is immense, and will continue to be so over the foreseeable future.
One issue is that developers have been trained for the past few decades to look for solutions to problems online by just dumping a few relevant keywords into Google. But to get the most out of AI you should really be prompting as if you were writing a formal letter to the British throne explaining the background of your request. Basic English writing skills, and the ability to formulate your thoughts in a clear manner, have become essential skills for engineering (and something many developers simply lack).
> the ability to formulate your thoughts in a clear manner, have become essential skills for engineering
<Insert astronauts meme “Always has been”>
The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible.
Dijkstra (1970) "Notes On Structured Programming" (EWD249), Section 3 ("On The Reliability of Mechanisms"), p. 7.
And
Some people found error messages they couldn't ignore more annoying than wrong results, and, when judging the relative merits of programming languages, some still seem to equate "the ease of programming" with the ease of making undetected mistakes.
Dijkstra (1976-79) On the foolishness of "natural language programming" (EWD 667)
by and large the programming community displays a very ambivalent attitude towards the problem of program correctness. ... I claim that a programmer has only done a decent job when his program is flawless and not when his program is functioning properly only most of the time. But I have had plenty of opportunity to observe that this suggestion is repulsive to many professional programmers: they object to it violently! Apparently, many programmers derive the major part of their intellectual satisfaction and professional excitement from not quite understanding what they are doing. In this streamlined age, one of our most under-nourished psychological needs is the craving for Black Magic, and apparently the automatic computer can satisfy this need for the professional software engineers, who are secretly enthralled by the gigantic risks they take in their daring irresponsibility.
Concern for Correctness as a Guiding Principle for Program Composition. (EWD 288)
Things don't seem to have changed, maybe only that we've embraced that black box more than ever. That we've only doubled down on "it works, therefore it's correct" or "it works, that's all that matters". Yet I'll argue that it only works if it's correct. Correct in the way they Dijkstra means, not in sense that it functions (passes tests).
50 years later and we're having the same discussions
> But to get the most out of AI you should really be prompting as if you were writing a formal letter to the British throne explaining the background of your request. Basic English writing skills, and the ability to formulate your thoughts in a clear manner, have become essential skills for engineering (and something many developers simply lack).
That's probably why spec driven development has taken off.
The developers who can't write prompts now get AI to help with their English, and with clarifying their thoughts, so that other AI can help write their code.
You are correct. You absolutely must fill the token space with unanbiguous requirements, or Claude will just get "creative". You don't want the AI to do creative things in the same way you don't want an intern to do the same.
That said, I have found that I can get a lot of economy from speaking in terms of jargon, computer science formalisms, well-documented patterns, and providing code snippets to guide the LLM. It's trained on all of that, and it greatly streamlines code generation and refactoring.
Amusingly, all of this turns the task of coding into (mostly) writing a robust requirements doc. And really, don't we all deserve one of those?
Where are you encountering all this slop code? At my work we use LLMs heavily and I don't see this issue. Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.
> Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.
That's why. I was using Claude the other day to greenfield a side project and it wanted to do some important logic on the frontend that would have allowed unauthenticated users to write into my database.
It was easy to spot for me, because I've been writing software for years, and it only took a single prompt to fix. But a vibe coder wouldn't have caught it and hackers would've pwned their webapp.
You can also ask Claude to review all the code for security issues and code smells, you'd be surprised what it finds. We all write insecure code in our first pass through if we're too focused on getting the proof of concept worked out, security isnt always the very 1st thing coded, maybe its the very next thing, maybe it comes 10 changes later.
Yes we do, you don't just start a brand new web project and spit out CORS rules, authentication schemes, roles, etc in one sitting do you? Are you an AI?
So let me get this straight, you get instructed to build an Instagram clone, and you sit down and one shot code every single feature for the project? My point is about in one sitting, doing EVERYTHING all at once, without pausing, without standing up, without breaks. I don't know about you but people who tend to rush code out make just as many if not worse mistakes than AI does.
I've worked with many competent engineers and have built things people couldn't even google help for before AI existed, and that surpassed mine and my teams expectations both solo and in a team setting, none of them were done in one sitting, which is what you're suggesting. Everything is planned out, and done piecemeal.
For the record, I can one shot an AI model to do all of those things, with all the detail they need and get similar output as if I gave a human all those tasks, I know because I've built the exact tooling to loop AI around the same processes competent developers use, and it still can do all of it in record time.
So if you're going to build a massive application say, YouTube, Facebook or Instagram you're going to sit down, and write out every template, db model, controller, view model, etc in one single sitting for the entire application? No bathroom breaks, no lunch, no "I'll finish that part tomorrow" you do it ALL in one sitting? Because you will miss something, and that's my point, nobody gets their first crack at a greenfield project 100% in one sitting, you build it up to what it is. The AI is used the same way.
I actually do build all of those things before standing something up in prod. Not doing that is insane. Literally every web framework has reasonable defaults baked in.
Any competent tech company will have canned ways to do all of those things that have already been reviewed and vetted
Why are you building and deploying a site critical enough to need CSP and user security & so on in one sitting lol
Anyways, yes, if I know I'm gonna need it? Because every framework has reasonable defaults or libraries for all of those things, and if you're in a corporate environment, you have vetted ways of doing them
1. import middleware.whatever
2. configure it
3. done
Like, you don't write these things unless you need custom behavior.
The issue isn't when the programmers start using it. It's when the project managers start using it and think that they're producing something similar to the programmers
> People aren't prompting LLMs to write good, maintainable code though.
Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I recently used Gemini to build my first Android app, and I have zero experience with Kotlin or most of the libraries (but I have done many years of enterprise Java in my career). When I started I first had a long discussion with the AI about how we should set up dependency injection, Material3 UI components, model-view architecture, Firebase, logging, etc and made a big Markdown file with a detailed architecture description. Then I let the agent mode implement the plan over several steps and with a lot of tweaking along the way. I've been quite happy with the result, the app works like a charm and the code is neatly structured and easy to jump into whenever I need to make changes. Finishing a project like this in a couple of dozen hours (especially being a complete newbie to the stack) simply would not have been possible 2-3 years ago.
> Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I'd argue that when the code is part of a press release or corporate blog post (is there even a difference?) by the company that the LLM in question comes from, e.g. Claude's C compiler, then one cannot reasonably assert they were "not using the tools correctly": even if there's some better way to use them, if even the LLM's own team don't know how to do that, the assumption should be that it is unreasonable to expect anyone else to how to do that either.
I find it interesting and useful to know that the boundary of the possible is a ~100kloc project, and that even then this scale of output comes with plenty of flaws.
Know what the AI can't do, rather than what it can. Even beyond LLMs, people don't generally (there's exceptions) get paid for manually performing tasks that have already been fully automated, people get paid for what automation can't do.
Moving target, of course. This time last year, my attempt to get an AI to write a compiler for a joke language didn't even result in the source code for the compiler itself compiling; now it not only compiles, it runs. But my new language is a joke language, no sane person would ever use it for a serious project.
LLMs do not learn. So every new session for them will be rebuilding the world from scratch. Bloated Markdown files quickly exhaust context windows, and agents routinely ignore large parts of them.
And then you unleash them on one code base that's more than a couple of days old, and they happily duplicate code, ignore existing code paths, ignore existing conventions etc.
That's why I'm very careful about how the context is constructed. I make sure all the relevant files are loaded with the prompt, including the project file so it can see the directory structure. Also keep a brief summary of the app functionality and architecture in the AGENTS.md file. For larger tasks, always request a plan and look through it before asking it to start writing code.
Not trying to be rude, but in a technology you're not familiar with you might not be able to know what good code is, and even less so if it's maintainable.
Finding and fixing that subtle, hard to reproduce bug that could kill your business after 3 years.
That's a fair point, my code is likely to have some warts that an experienced Android/Kotlin dev would wince at. All I know is that the app has a structure that makes an overall sense to me, with my 15+ years of experience as a professional developer and working with many large codebases.
I think we are going to have to find out what maintenance even looks like when LLMs are involved. "Maintainable" might no longer mean quite the same thing as it used to.
But it's not going to be as easy as "just regenerate everything". There are dependencies external to a particular codebase such as long lived data and external APIs.
I also suspect that the stability of the codebase will still matter, maybe even more so than before. But the way in which we define maintainability will certainly change.
The framing is key here. Is three years a long time? Both answers are right. Just getting a business off the ground is an achievement in the first place. Lasting three years? These days, I have clothes that don't even last that long. And then three years isn't very long at all. Bridges last decades. Countries are counted by centuries. Humanity is a millennia old. If AI can make me a company that's solvent for three years? Well, you decide.
That mirrors my experience so far. The AI is fantastic for prototyping, in languages/frameworks you might be totally unfamiliar with. You can make all sorts of cool little toy projects in a few hours, with just some minimal promoting
The danger is, it doesn't quite scale up. The more complex the project, the more likely the AI is to get confused and start writing spaghetti code. It may even work for a while, but eventually the spaghetti piles up to the point that not even more spaghetti will fix it
I'll get that's going to get better over the next few years, with better tooling and better ways to get the AI to figure out/remember relevant parts of the code base, but that's just my guess
MS Copilot for Android has some annoying UI bugs which they seemingly refuse to fix. The biggest is that the chat area sometimes gets randomly resized so that you can hardly read anything. There is also no way to search past chats. For all the billions they are spending on AI, their chat interface seems inexplicably half-assed.
reply