For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | SgtBastard's commentsregister

Mistral in the EU, for one.

Unfortunately Mistral is not close to the frontier. Their last release Mistral Medium 3.5 128B is near the performance[0] of QWEN-3.6-27B, a much smaller model that was released earlier.

It's good that they exist, and I hope they catch up, but if you don't have origin constraints for your use case I don't see why you would chose their models today.

[0]: On the only benchmark they both published performance results - SWE-bench Verified -they are within a margin of error Mistral 77.6 vs Qwen 77.2.


Within Spec Driven Development style coding for me Claude Sonnet 4.5 was the game changer and Mistral is I believe at that level already. GLM is allegedly also on par with even some of the Opus models, so if the US vendors would vanish tomorrow, there would be alternatives. Would I miss Opus 4.8 and the Claude Code harness? of course I would! But the world wouldn't stop.

What I am trying to get at is that the frontier is great, but you can be fine with less as well.


they already firmly in irrelevant territory

> irrelevant territory

Not for the EU. Given the political importance of LLMs and the talent pool in France (let alone rest of the EU), I fully expect them to catch up.


Mistral (a French company) shouldn’t be discounted.


What makes you think 1-2, 6-8 can’t be done by agents?


1. An agent is not going to talk to the “business” and solve XYProblems, conflicting agendas, and deal with strategy. I’ve had to push back on people in my own company that want to give customers “questionnaires” to fill out pre engagement and I refuse to do it on any project I lead. An agent can tell facial expressions, uncertainty etc.

2. AI is horrible at system design. One anecdote. I was vibe coding an internal website that will at most be used by 7 people in total. Part of it was uploading a file to S3 and then loading the file into an Postgres table. It got the “create pre-signed S3 url and upload it directly to that instead of sending it to the API” correct (documented best practice). But then it did the naive “upload the file from S3 and do a bulk sql insert into the database”. This would have taken 20 minutes. The optimized method that I already knew was just to use the Postgres AWS extension to load it directly from S3 - 30 seconds. I’ve heard from a lot of data engineers run into similar problems (I am not one. I play one sometime).

6. Involves talking to the customer and UX.

7. Moving to production doesn’t take AI. Automation, stage deployments, automated testing and monitoring, blue /green deployments etc is a solved problem.

8. Monitoring is also a solve problem pre AI. It’s what happens after a problem is what you need people for.

So yes 1,2 and 7 are high value, high touch. If you look at the leveling guidelines for any BigTech company, you have to be good at 1 and 2 at least to get pass mid level.

Then there is always “0” pre-sales. I can do inbound pre-sales (not chase customers). It’s not that much different than what I do now as the first technical person who does a deep dive strategy conversation


Every problem you described is solvable and while it may not be solved right now or even in 6 months it'll probably be solved within 18 months. It's just scaling and tuning the models


You can’t “tune models” to get people willing to get on a zoom call with an agent and the agent asks them questions and talk through strategy and understand human emotions.

Are they also going to interact with the model for a design review session?

Tell the model where it got it wrong and the model is going to make the changes?


In 18 months AI agents will be able to accurately infer people's emotional state from the subtle facial expressions they make in a sales meeting, in real time?

I'll believe it when I see it


Yes - itself.


Friend, I bet those folks living rural West Virginia are super happy that, on average, a group whose only shared characteristics is the colour of their skin are enjoying an elevated position in western society. Super happy. All racism is gross.


Ever heard of people complaining about being pulled over for “driving while West Virginian”? Why or why not?


Contrary to non-white people, yes. Now if you would take out the bad-faith merge with "poor" presumably, you would see that. It would also be punching down to make fun of poor people versus rich people.


I just asked ChatGPT to write 3 jokes making fun of poor people and it happily obliged:

1. Being broke is when your bank app sends you notifications like, “You good?” 2. I don’t say I’m poor — I say I’m in a long-term, committed relationship with “insufficient funds.” 3. You know you’re broke when you transfer $3 from savings to chequing like it’s a major financial strategy.


I bet they are happy. It means ICE won't harass you.


Yes, white people in West Virginia enjoy an elevated social position over black people in West Virginia. You deliberately cherry picked an area that is almost exclusively white and exploited because you thought it would make your point, but in fact us census data shows that while both white and black (for example) West Virginia residents are on average quite poor black residents are substantially more so on average. Social position is based on more than just income, but it's a decent proxy.

But you knew that this was an example of a disadvantaged group already. ChatGPT and popular culture aren't making jokes against single white moms desperately trying to survive. They're making jokes about stereotypical white suburban culture. This is a distinct social and economic class

I reiterate: emotionally fragile snowflakes who can't stand that there is even a single aspect of life on earth in which their social group isn't 100% dominant. It's jokes dude. You'll be ok.


How do you feel about your current levels of dakka? </40k>


Are you ok there? SAML, OIDC and a depressingly long tail of Kerberos is how modern enterprise identity security works.


just getting knoll's law'd or gell-mann triggered as HN does, "modern enterprise security" is a 20-layer cake of serious itu and nist cryptographic protocols like radius and x509 kerberos (which we're depressed about for some reason? is it because it can't be implemented in javascript?) but it's saml that's used at the web (shit) application-tier for customers of saas products so that's the technology that makes the world go round according to HN... just ignore me, most of HN's database threads do this to me as well


Fancy while() loops is how I describe them.


Claude, with a modicum of guidance from an engineer familiar with your monolith, could could write comprehensive unit tests of your existing system, then refactor it into coherent composable parts, in a day.

Not doing so while senior management demands the use of AI augmentation seems odd.


It's a 25-year-old CAD application written in very non-standard C++. I doubt it.

Certainly I have tried to accomplish tasks giving Claude guidance far outstripping "a modicum".


Only if you’re relying upon the models to recall facts from its training set - intuitively, at sufficient complexity, models ability to reason is what is critical and can have its answers kept up to date with RAG.

Unless you mean out of date == no longer SOTA reasoning models?


If you're using the models to assist with coding—y'know, what this thread is about?—then they'll need to know about the language being used.

If you're using them for particular frameworks or libraries in that language, they'll need to know about those, too.

If training becomes uneconomical, new advances in any of these will no longer make it into the models, and their "help" will get worse and worse over time, especially in cutting-edge languages and technologies.


This thread is about coding agents, of which a model is only one (important) part.


'ability to reason' implies that LLMs are building a semantic model from their training data, whereas the simplest explanation for their behavior is that they are building a syntactic model (see Plato's Cave). Thus without new training they cannot 'learn', RAG or no RAG.


We have multiple threads of research demonstrating in-context learning, friend.

https://github.com/dqxiu/ICL_PaperList


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You