Glib is called for. The amount of information asymmetry that's still on the table as vibe coders and vibe engineers and vibe doctors emerge is staggering. Professional experience is still incredibly valuable. Most software developers might spend more than 6% of their time coding but no senior developers are banging their heads for hours over typos.
LLMs evaporated 90% of the "moments of despair" when you have an error and googling it isn't helping, or googling it made you realize you have to read 30min of documentation.
Coding is a joy now. LLMs shaved off all the rough edges.
A year ago I would've told my boss “can't be done” about my work today. I'd tell him to get me the right person to talk to (our partner, not an alien) who could give me some insight into what the hell I'm supposed to be doing to consume their API. Or to at least explain why it is that this can't be done.
Nowadays, I spent a couple of weeks reverse engineering their terrible ideas. Yeah, it worked. But it's a complete waste of my time, and tokens, energy, chips and RAM. And worst of all, it will lead to a terrible design.
That will work, but will eventually colapse under its own weight, as we use our increased power to increase our sloppiness and take it a little further. Because we can manage it. For now.
LLMs moved the moments of despair to PR reviews for me. It used to be that you could check on a junior dev occassionally throughout the day to make sure they're on the right track. Now you step away for 2 hours and they're raising a PR of bad code smell spaghetti and moving on to repeat their AI slopfest on the next task.
It's getting hard to keep up with trying to teach new devs what bad code looks like. And I swear sometimes they just copy my PR comments into their AI tool to fix the mistakes without any of the learning.
At some point there needs to be an uncomfortable conversation about how if all they’re doing is copy pasting everything they get from you into ChatGPT, you can do it yourself for much much cheaper.
how? management in most Tech companies are incentivizing them to do just that, so if you bring it up, they'll happily trot over to your manager to complain and then the uncomfortable conversation is you with management about why you're getting in the way of AI uptake by the team.
Don’t allow juniors to use AI. It’s like university exams: no programmable calculators allowed. Review assistants or senior who know what’s going on should though, it does help when used correctly
I've tried this without much luck. In my experience they get too bogged down on surface things and don't have the necessary business requirements/context to understand and find actual bugs.
How have you set yours up that works well for you?
So create a context document that explains the business context, and add that to the agent.
Take the bad result that you're getting, and pretend it's coming from an enthusiastic junior. What would you tell them to make them do this task better? Add that explanation to the agent (or explain that to the LLM and get it to add that to the agent, I have found this to work as well).
When you create a task for the LLM, get it to create a requirements document that lists all the requirements. Feed that into the review agent so it understands what the code agent was trying to do.
The LLM will do what you tell it to do. It doesn't magically understand what you want it to do. You have to tell it what to do.
You can't possibly believe this, or you and me (and many others) are doing something different. LLMs have created an entire new - huge - set of bang-your-head moments, as they go off half-cocked in a million simultaneous directions, chasing their tail, or just making shit up. And since the vast majority of work is on existing - often ancient - codebases, let's find out if you feel the same way in 18 months.
Maybe I'm weird, but my usage has been very conservative. As in, I treat the LLM like a junior dev that I have to micromanage and handhold.
I am terrified of allowing these things to complete tasks end-to-end with nothing intervening. Maybe that's why I don't run into many of these issues. I mostly delegate grunt work and manual tedium, not reasoning or design choices to the LLM. I may consult the LLM and ask for criticism, but there is no way I'm going to allow it to quietly make design decisions that I don't know about.
1. Copy-pasting code into the web chat UI and asking for something (bugfix, add a feature, refactor, explain, review it etc), including entire source code files. A $20/mo Gemini subscription goes a long way (never been rate-limited). I only use the highest model. I often just copy-paste the entire source file between 3 backticks.
2. Cursor Tab. I do have hotkeys to enable and disable it; it's disabled most of the time otherwise it gets annoying.
3. Single-file changes directly from Cursor's AI sidebar. I only do this for simple, predictable stuff because even their auto-routing "Premium" setting is not as good as pasting stuff into Gemini 3.1 Pro.
That means I have only two $20/mo subscriptions: Gemini and Cursor.
I don't use Claude Code, it's really for people who don't know how to code. I don't use Plan Mode; I make and track the plan myself (if at all). I only tell the LLM granular tasks to execute. I don't use `claude.md` or `agents.md` or anything like that. If I don't like a particular output, I reset everything, modify my prompt and try again.
I believe this is the only way to fully leverage LLMs without losing any product quality. If you're trading off quality for "speed" (in quotes because over the long term, a low quality codebase is a massive drag on productivity) then there's no point.
I _think_ what you’ve said is “go shallow, not deep”. That is, don’t let the walk you make inside the latent space a long one. Twenty-five short and peppered steps, from de novo, is better than one long, protracted stew.
Well, if it works on step one, then why not step two? Where would different folks draw the line? My grandparents might continue on a while, whereas I would not. But if it also “works” on step two for me, should I take a third?
What counts as “works” is the important bit, I think.
Yes, if you're using them to write large chunks of code or entire features. If you just use them to clear up some trivial problem in an unfamiliar technology that you used to spend 30 minutes googling with 50 tabs open, or stuff like write a method to filter, map and reduce an array based on specific criteria, they're a godsend.
You are in charge of what the LLM does. If it's running off half-cocked in a million simultaneous directions, that's on you. Write better skills. Tell it not to do that. Break into its loop and ask it wtf it thinks it's doing. If it's making shit up, force it to test more.
The LLM will do what you tell it to do. Manage it.
Languages have been reporting compile and runtime errors for decades. Additionally very few senior developers don't already have their minds wired to spot typos the way copy editors spot bad punctuation. Typos were only really a problem for students.
Equal? No, no no no. Upper management is making PoCs that promise to solve longstanding multi year learnings of tradeoffs and solution balancing, and setting goals based on that. We are heading to a cliff and everyone is going to learn what happens when you replace already vulnerable foundation pillars with pig iron.
100%. Googling when you don't even know enough to ask the right questions, with 50 tabs open and trying to read down to the 3rd or 4th Stack Overflow answer (which is usually the best for some inexplicable reason), was my least favorite part of development.
I don't miss wasting an hour on a problem in a technology I'm not familiar with, where it's not like a big conceptual thing but something I could clear up in 5 seconds if I just had an expert in the room.
Maybe you aren't familiar with how AI works. It writes the code for you. Nobody is "letting it mis-spell things". You run the code it wrote, it fails. You look through the code the AI wrote and find the typo it put in there, or give the AI the error for it to fix - but it still created the typo, and that is the main point here. AI often ignores the rest of the document and does what-ever-the-fuck it wants to make you stop prompting it, without any real concern for correctness.
It writes the code for you. Then it runs the tests. Then it runs the linter. Then it runs the static analysis tool. If any of those fail, then it rewrites the code and runs them all again.
You only look at the code once it has done all of that.
If AI is ignoring the rest of the document and doing whatever it wants then you need to improve your document-writing skills. You can ask it why it did something, that helps discover how to improve. It's a process of refinement and discovery, just like learning how to use any new tool.
Maybe you aren't familiar with all the ways AI can be used.
I use AI in two ways. In the way you describe, and also in the text editor as AI autocomplete. It works great until it doesn't. It inserts typos all the fucking time.
This is temporary. What is the SKILL.md equivalent going to be in five years? In ten? You don't already see a pattern emerging around solutions to encode that "professional experience" into the tools themselves?
These LLMs can already incorporate our entire cultural corpus yet your "professional experience" is the threshold they won't cross?
The word “incorporate” is doing some very heavy lifting in your assertion. These LLMs already have access to the whole corpus of architectural knowledge and software best practices, and yet they’re unable to reliably implement those best practices. Why not? Why do they often make completely unintuitive decisions, even when repeatedly prompted to ask clarifying questions?
To be clear by that and "cultural corpus" I meant their skill with natural languages. It is well known for instance that early LLMs were curiously better at composing sentences in English than doing basic math.
Regarding such formal reasoning we have already seen marked improvement in the last year or two alone. The question is how this weighs on your prediction re their capabilities in the next two, five, ten, etc years.
What are the properties of LLMs that have convinced you that there remains emergent complexity (e.g. the “ability” to formally reason) that we have not yet seen?
There may be gains to be had in such emergence but that is not where I see the gains in the next five years. Those gains will be made by connecting LLMs more robustly with formal reasoning, which computers are already very good at. Continued iteration on connecting these right/left brain faculties could then lead to further emergence down the line.
The present notions of harnesses, structured output or looping in the LLM to some external state or sandbox be it debugger output or embedding into a runtime already show early promising results along these lines. I see no reason to believe these gains will not continue over the next five years.
If you have some theories in the converse in that regard I am all ears.
Extraordinary claims require extraordinary evidence, not the opposite. There’s no current evidence to suggest limitless progress, or even superlinear progress with regards to compute and energy. My guess would be sub linear or even logarithmic progress vs. linear growth in compute and energy, as that’s how most physical systems behave.
No one said unlimited progress. Let's not revert to straw man claims.
If you think the potential of LLMs is overblown feel free to short the market. I don't pretend to know the future. But if I may, I don't think you are framing the debate in the correct terms. Evidence is an important facet of human affairs. So is risk. Best of luck with your predictions.
Markets can remain irrational longer than anyone can stay solvent (especially when wealth is as concentrated as it is currently: one doofus can keep an entire industry afloat).
“Unlimited progress” is not a statement on the rate of progress, it’s a statement on the limits of progress. It’s a much weaker claim than you’re framing it as. Your claim very much is that we have not yet reached the limits of LLMs potential. My claim, conversely, is that we’re already reaching diminishing returns, which are being masked by a massive influx of compute and energy. My short: LLMs are not the path to AGI.
I really don't like this framing - it's hard to short a market at the best of times, let alone when governments have a vested interest in tech being too big to fail to compete in the global economic arms race - see Intel's stock in the past few months.
I agree with you both - undoubtedly there are still massive gains to be made with the frontier models we have today with tooling and iteration, yet I do not believe there's sufficient evidence to claim we are rolling towards AG/SI on an exponential curve, without some additional breakthroughs given the jagged edges and data used to train models being fundamentally linear
Just remember you don't need AGI to see massive societal change. Certainly not mass layoffs. AGI is not the bar. By the time we all agree AGI has come the world will have already changed.
You just need AI to be just good enough to win the tradeoff over a human employee. Just take your average office. Then ask yourself if the bar is really that high. AGI strikes me as an extremely nebulous concept. Better to just list everyone at your office and bucket them with a guess of how soon you think AI will replace them. Or weaken their market power. This is what every corporate boss in America is already doing. I'm merely suggesting rather than hope a graph curves in our individual favor we try to act more collectively as a species. Of course, I don't hold my breath.
I also don't find myself compelled by the notion that the danger to humanity is "AGI". The true danger is as it always has been - each other.
> Just take your average office. Then ask yourself if the bar is really that high.
How many years away do you think we are from a “concierge” AI that can do the menial tasks handled by most personal assistants / program managers? Booking flights and hotels and coordinating employee availability?
> Why do they often make completely unintuitive decisions
Most likely because you haven't constrained their behavior in your prompt. You're making the assumption that they "understand" that using best practices is what you want. You have to tell them that, and tell them which practices they should use.
They already fail consistently follow very simple and concrete instructions like “Please do not ever mock this object, always properly construct it in your tests”, so I’m not sure how they’re going to adhere to more vague and conceptual architectural paradigms. This is a problem with generative AI in general - image generation has similar limitations.
The capacity of the person prompting it to understand is the threshold they won't cross. They can squeeze the gap as much as possible by dumbing down answers or slowly ramping up information complexity but there is a limit to comprehension.
This is an interesting answer for questions about human agency and accountability/personhood questions but I don't see how it leads to increased confidence in the role of human as SWE.
If LLMs get good enough, one might be tempted to ask so what if most humans can't understand the output? Human civilization has by and large been a constant exercise in us collectively accomplishing more and more while individually comprehending less and less.
Our ancestors likely understood more about hunting live game or murdering each other than we do. Most of us do not consider that a great loss. Most of us living in the modern world depend on things we don't fully comprehend. I'm just not sure how this would lead to being reassured re the human as SWE.
Do you really want to live in a world when nobody understand software that manages nuclear power plant? Or medical devices? Or financial software? Or radio transceivers firmware? Even something so boring like databases not understood could lead to disastrous effects if this would be the government database for managing people IDS. Hmm even if this would be working fine for years what would happen if bad actor would influence models to generate code if security issues? If nobody can comprehend the output how anybody would be able to think about the danger? This is even more grim then this
https://www.citriniresearch.com/p/2028gic
We live in a world with nuclear weapons. Somehow we all cope and get up every morning. I think you are missing the point - the world is already grim. It always has been. What about human affairs say in the last century alone makes you think human oversight is some panacea? The impetus for civilization was not some innate desire for financial systems or medicine. It was not having other humans murder you. The Leviathan is already here.
The article you shared has little to do with this. Questions of how to divide up gains technology creates are a separate question from that of the technology itself. Tbh I found what you shared so boring I could barely finish it. I already in this thread made an exhortation to support politicans who commit to erasing inequality. The idea that LLMs can only exist with inequality is nonsensical. The only thing grim about what you shared is the lack of political imagination. It's boring.
We don't need as many hunters because we've domesticated sources of meat. We still need ranchers, butchers... an entire supply chain to get meat to consumers. We didn't remove humans from the loop, we just created specializations.
Software specialization might look very different in 10 years but I doubt that technically specialized humans will be completely removed from their professions. We might not be carrying bows and arrows anymore but we will be carrying the equivalent of a rope and a Stetson.
Ranchers, butchers... and factory farms. Most meat Americans consume have had very little interaction with a person until they are being devoured on the plate.
I appreciate your points. I agree with you that not all "technically specialized humans will be completely removed" but let's not pretend the comparison is going from a caveman with a spear to a cowboy with a lasso. If you concede it is likely to be very different at some point calling it SWE is no longer useful.
I think SWEs would be better off realizing they have enjoyed a relatively extreme level of privilege, and rather than trying to hold onto it, use what time they still have to advocate for a more egalitarian society, even if that means giving up some of their gains. Otherwise speaking of farming, the mass layoffs to come when software has been disrupting blue collar jobs for decades will really be a chickens coming home to roost moment.
Now you're arguing against your own analogy? Hunter was ubiquitous position in human society prior to the domestication of animals. 50% of the workforce in hunter-gather societies. Today, 12 millennia after the domestication of wildlife, that number is down to 9-14% of the global workforce dedicated to the production, distribution, processing, sales of meat (not including cooked food) according to opus.
Considering that only 1% of the US workforce was a software engineer I expect similar workforce optimization to occur in software engineering specializations over the next 12,000 years. /s But seriously, it's never going to zero.
No one said it's going to zero. It doesn't have to go to zero for lives to change. Would you rather be a cowboy or a factory farmer? The latter are some of the least desirable jobs in the entire world. The fact that millions of people still do them isn't the point in your column you think it is.
I gave it a try a few weeks ago tbh, I'll give it another shot tho. I mainly use their Web chats since that's easier to use and previously, qwen, deepseek, kimi, all were unable to output proper docx files or use skills.
Good homogenous experience is the hallmark of good design. There are no surprises with good design. It just works the way you expect it to work. Good design should not generally challenge your expectations.
Tests for correctness, self similarity, duplication of concerns, contradictory statutes, edge case detection, cruft or outdated laws that muddy the waters...
If the full compliment of software development practices were applied to legislation and ordinances we would be living in a very different world.
The term for this is ethical consumerism or conscious consumerism, defined as purchasing products that align with moral, social, or environmental values, acting as a form of "voting" with one's money.
Virtue signaling takes place wherever changes in group behavior are required by changes in conditions but calling it just virtue signaling is reductive. People are moving off of US services because of the behavior of the US government and US citizens.
Flash should have transitioned into an authoring tool for SVG + CSS + JS but it just took a knee because so many people hated flash for all of its warts by the time SVG and Canvas moved vector graphics rendering to the browser. Flash was a real pain the ass for most web users and Web 2.0 technologies did kill it.
“Apache Flex, formerly Adobe Flex, is a software development kit (SDK) for the development and deployment of cross-platform rich web applications based on the Adobe Flash platform. […] Adobe donated Flex to the Apache Software Foundation
[…]
In 2014, the Apache Software Foundation started a new project called FlexJS to cross-compile ActionScript 3 to JavaScript to enable it to run on browsers that do not support Adobe Flash Player and on devices that do not support the Adobe AIR runtime. In 2017, FlexJS was renamed to Apache Royale. The Apache Software Foundation describes the current iteration of Apache Royale as an open-source frontend technology that allows a developer to code in ActionScript 3 and MXML and target web, mobile devices and desktop devices on Apache Cordova all at once”
[1] I may be wrong though. It’s not easy figuring out what Flash code ended up in which of Adobe’s Flash-like products over time.
I think the problem might actually be with reenforcing the red lines. The events of the last few weeks and this new deal only make sense if Anthropic was trying to find out how Palantir and the Pentagon had circumvented their restrictions to attempt to reenforce those restrictions like company actually concerned about the misuse of their product. OpenAI most likely came in with assurances that they wouldn't attempt to reinforce their restrictions.
https://www.youtube.com/shorts/xBilK3gT5e0