And because people writing anything worth reading are using the process of writing to form a proper argument and develop their ideas. It’s just not possible to do that by delegating even a small chunk of the work to AI.
Taking time to figure out if you’re the right fit for the company and the company is the right fit for you is a very good thing. For both parties! Rushed hiring processes increase the chances of you being fired for not being the right fit. Short hiring processes are a massive red flag for me.
True, but it becomes a problem when the entire thing is automated. Because then it's entirely one-way. You spend infinite cost and money, they spend nothing. So, you can't even figure out yourself if you're a good fit. It's entirely in their court.
"I think the Culture’s values are a winning strategy because they’re the sum of a million small decisions that have clear moral force and that tend to pull everyone together onto the same side."
Dario Amodei [1]
Text to code is clearly valuable but the code to text capability of LLMs is seriously underrated IMO. I would argue orgs should prioritise giving PMs Claude Code licenses over devs. So much efficiency unlock without the worry about whether vibe code can be shipped to prod.
Don’t know about anyone else, but I find OpenAI irrelevant now. I bought an anthropic pro account to get Claude code and now just use Anthropic for everything. I can’t see anything drawing me back to OpenAI ecosystem. What am I missing?
Very good points, but, I think this blog is pretty focussed on the developer use case for LLMs. It makes a lot more sense in chat style interfaces for connecting to non-dev tools or services with non technical users, if anything just from a UX perspective.
Thank you, I was going to say something like this. I've been reading all the comments here and thinking, "do ChatGPT/LeChat/etc even allow running CLIs from their web or mobile interfaces?".
Exactly. and even if so, how are you going to safe guard tool access?
Imagine your favorite email provider has a CLI for reading and sending email - you're cool with the agent reading, but not sending. What are you going to do? Make 2 API keys? Make N API keys for each possible tool configuration you care about?
MCPs make this problem simple and easy to solve. CLIs don't.
I don't think OpenClaw will last that long without security solved well - and MCPs seem to be obvious solution, but actively rejected by that community.
Supposedly, you make a Skill for it, but even that is out of scope for chat agents. I didn't scroll far, but I wouldn't be surprised more people in this thread have made the mistake of giving that answer.
Yeh I think there’s an issue with being off the platform for a long time. Almost exactly same thing happened to me after not logging in for about 10 years. The algorithm just doesn’t know what to do with you. But then I almost immediately go banned for breaching community guidelines after doing nothing but scrolling. So from my experience I can confirm, it’s a total bin fire.
I still don’t really get this argument/excuse for why it’s acceptable that LLMs hallucinate. These tools are meant to support us, but we end up with two parties who are, as you say, prone to “hallucination” and it becomes a situation of the blind leading the blind. Ideally in these scenarios there’s at least one party with a definitive or deterministic view so the other party (i.e. us) at least has some trust in the information they’re receiving and any decisions they make off the back of it.
For these types of problems (i.e. most problems in the real world), the "definitive or deterministic" isn't really possible. An unreliable party you can throw at the problem from a hundred thousand directions simultaneously and for cheap, is still useful.
"The airplane wing broke and fell off during flight"
"Well humans break their leg too!"
It is just a mindlessly stupid response and a giant category error.
The way an airplane wing and a human limb is not at all the same category.
There is even another layer to this that comparing LLMs to the brain might be wrong because the mereological fallacy is attributing the brain "thinks" vs the person/system as a whole thinks.
You are right that the wing/leg comparison is often lazy rhetoric: we hold engineered systems to different failure standards for good reason.
But you are misusing the mereological fallacy. It does not dismiss LLM/brain comparisons: it actually strengthens them. If the brain does not "think" (the person does), then LLMs do not "think" either. Both are subsystems in larger systems. That is not a category error; it is a structural similarity.
This does not excuse LLM limitations - rimeice's concern about two unreliable parties is valid. But dismissing comparisons as "category errors" without examining which properties are being compared is just as lazy as the wing/leg response.
People, when tasked with a job, often get it right. I've been blessed by working with many great people who really do an amazing job of generally succeeding to get things right -- or at least, right-enough.
But in any line of work: Sometimes people fuck it up. Sometimes, they forget important steps. Sometimes, they're sure they did it one way when instead they did it some other way and fix it themselves. Sometimes, they even say they did the job and did it as-prescribed and actually believe themselves, when they've done neither -- and they're perplexed when they're shown this. They "hallucinate" and do dumb things for reasons that aren't real.
And sometimes, they just make shit up and lie. They know they're lying and they lie anyway, doubling-down over and over again.
Sometimes they even go all spastic and deliberately throw monkey wrenches into the works, just because they feel something that makes them think that this kind of willfully-destructive action benefits them.
All employees suck some of the time. They each have their own issues. And all employees are expensive to hire, and expensive to fire, and expensive to keep going. But some of their outputs are useful, so we employ people anyway. (And we're human; even the very best of us are going to make mistakes.)
LLMs are not so different in this way, as a general construct. They can get things right. They can also make shit up. They can skip steps. The can lie, and double-down on those lies. They hallucinate.
LLMs suck. All of them. They all fucking suck. They aren't even good at sucking, and they persist at doing it anyway.
(But some of their outputs are useful, and LLMs generally cost a lot less to make use of than people do, so here we are.)
I don’t get the comparison. It would be like saying it’s okay if an excel formula gives me different outcomes everytime with the same arguments, sometimes right, but mostly wrong.
As far as I can tell (as someone who worked on the early foundation of this tech at Google for 10 years) making up “shit” then using your force of will to make it true is a huge part of the construction of reality with intelligence.
Will to reality through forecasting possible worlds is one of our two primary functions.
Big community of people who motorbike around the world non-stop. It’s definitely possible to prepare beforehand and actually more admin getting a vehicle through borders.
Biking is faster, you can arrange for all visas for 6 months in advance but not for years. Even for 6 months to have them all approved with no gaps requires either a lot of luck or a very strong passport or both.
Yeah, it used to be that you could get a visa from the local embassy of the country you were currently in. These days, not so much. There are a lot more obstacles to long duration travel now--there are not enough long duration travelers for the system to be set up for it.