More

SgtBastard · 2026-06-13T06:26:17 1781331977

Mistral in the EU, for one.

Iolaum · 2026-06-13T06:45:32 1781333132

Unfortunately Mistral is not close to the frontier. Their last release Mistral Medium 3.5 128B is near the performance[0] of QWEN-3.6-27B, a much smaller model that was released earlier.

It's good that they exist, and I hope they catch up, but if you don't have origin constraints for your use case I don't see why you would chose their models today.

[0]: On the only benchmark they both published performance results - SWE-bench Verified -they are within a margin of error Mistral 77.6 vs Qwen 77.2.

istvan0 · 2026-06-13T07:02:31 1781334151

Within Spec Driven Development style coding for me Claude Sonnet 4.5 was the game changer and Mistral is I believe at that level already. GLM is allegedly also on par with even some of the Opus models, so if the US vendors would vanish tomorrow, there would be alternatives. Would I miss Opus 4.8 and the Claude Code harness? of course I would! But the world wouldn't stop.

What I am trying to get at is that the frontier is great, but you can be fine with less as well.

SilverSlash · 2026-06-13T07:48:36 1781336916

they already firmly in irrelevant territory

ignoramous · 2026-06-13T07:55:42 1781337342

> irrelevant territory

Not for the EU. Given the political importance of LLMs and the talent pool in France (let alone rest of the EU), I fully expect them to catch up.

SgtBastard · 2026-04-24T08:45:24 1777020324

Mistral (a French company) shouldn’t be discounted.

SgtBastard · 2026-03-24T20:36:26 1774384586

What makes you think 1-2, 6-8 can’t be done by agents?

raw_anon_1111 · 2026-03-24T21:08:55 1774386535

1. An agent is not going to talk to the “business” and solve XYProblems, conflicting agendas, and deal with strategy. I’ve had to push back on people in my own company that want to give customers “questionnaires” to fill out pre engagement and I refuse to do it on any project I lead. An agent can tell facial expressions, uncertainty etc.

2. AI is horrible at system design. One anecdote. I was vibe coding an internal website that will at most be used by 7 people in total. Part of it was uploading a file to S3 and then loading the file into an Postgres table. It got the “create pre-signed S3 url and upload it directly to that instead of sending it to the API” correct (documented best practice). But then it did the naive “upload the file from S3 and do a bulk sql insert into the database”. This would have taken 20 minutes. The optimized method that I already knew was just to use the Postgres AWS extension to load it directly from S3 - 30 seconds. I’ve heard from a lot of data engineers run into similar problems (I am not one. I play one sometime).

6. Involves talking to the customer and UX.

7. Moving to production doesn’t take AI. Automation, stage deployments, automated testing and monitoring, blue /green deployments etc is a solved problem.

8. Monitoring is also a solve problem pre AI. It’s what happens after a problem is what you need people for.

So yes 1,2 and 7 are high value, high touch. If you look at the leveling guidelines for any BigTech company, you have to be good at 1 and 2 at least to get pass mid level.

Then there is always “0” pre-sales. I can do inbound pre-sales (not chase customers). It’s not that much different than what I do now as the first technical person who does a deep dive strategy conversation

turlockmike · 2026-03-25T01:26:15 1774401975

Every problem you described is solvable and while it may not be solved right now or even in 6 months it'll probably be solved within 18 months. It's just scaling and tuning the models

raw_anon_1111 · 2026-03-25T01:29:35 1774402175

You can’t “tune models” to get people willing to get on a zoom call with an agent and the agent asks them questions and talk through strategy and understand human emotions.

Are they also going to interact with the model for a design review session?

Tell the model where it got it wrong and the model is going to make the changes?

queenkjuul · 2026-03-25T11:03:25 1774436605

In 18 months AI agents will be able to accurately infer people's emotional state from the subtle facial expressions they make in a sales meeting, in real time?

I'll believe it when I see it

SgtBastard · 2026-03-21T02:12:13 1774059133

Yes - itself.

SgtBastard · 2026-03-03T20:48:57 1772570937

Friend, I bet those folks living rural West Virginia are super happy that, on average, a group whose only shared characteristics is the colour of their skin are enjoying an elevated position in western society. Super happy. All racism is gross.

gammarator · 2026-03-03T21:10:28 1772572228

Ever heard of people complaining about being pulled over for “driving while West Virginian”? Why or why not?

duskdozer · 2026-03-04T06:16:12 1772604972

Contrary to non-white people, yes. Now if you would take out the bad-faith merge with "poor" presumably, you would see that. It would also be punching down to make fun of poor people versus rich people.

I-M-S · 2026-03-04T10:32:22 1772620342

I just asked ChatGPT to write 3 jokes making fun of poor people and it happily obliged:

1. Being broke is when your bank app sends you notifications like, “You good?” 2. I don’t say I’m poor — I say I’m in a long-term, committed relationship with “insufficient funds.” 3. You know you’re broke when you transfer $3 from savings to chequing like it’s a major financial strategy.

jbeam · 2026-03-03T21:11:08 1772572268

I bet they are happy. It means ICE won't harass you.

idiotsecant · 2026-03-05T05:12:12 1772687532

Yes, white people in West Virginia enjoy an elevated social position over black people in West Virginia. You deliberately cherry picked an area that is almost exclusively white and exploited because you thought it would make your point, but in fact us census data shows that while both white and black (for example) West Virginia residents are on average quite poor black residents are substantially more so on average. Social position is based on more than just income, but it's a decent proxy.

But you knew that this was an example of a disadvantaged group already. ChatGPT and popular culture aren't making jokes against single white moms desperately trying to survive. They're making jokes about stereotypical white suburban culture. This is a distinct social and economic class

I reiterate: emotionally fragile snowflakes who can't stand that there is even a single aspect of life on earth in which their social group isn't 100% dominant. It's jokes dude. You'll be ok.

SgtBastard · 2026-02-22T20:59:05 1771793945

How do you feel about your current levels of dakka? </40k>

SgtBastard · 2026-02-21T07:50:46 1771660246

Are you ok there? SAML, OIDC and a depressingly long tail of Kerberos is how modern enterprise identity security works.

gfody · 2026-02-23T05:36:42 1771825002

just getting knoll's law'd or gell-mann triggered as HN does, "modern enterprise security" is a 20-layer cake of serious itu and nist cryptographic protocols like radius and x509 kerberos (which we're depressed about for some reason? is it because it can't be implemented in javascript?) but it's saml that's used at the web (shit) application-tier for customers of saas products so that's the technology that makes the world go round according to HN... just ignore me, most of HN's database threads do this to me as well

SgtBastard · 2026-02-03T21:38:36 1770154716

Fancy while() loops is how I describe them.

SgtBastard · 2026-02-03T21:07:57 1770152877

Claude, with a modicum of guidance from an engineer familiar with your monolith, could could write comprehensive unit tests of your existing system, then refactor it into coherent composable parts, in a day.

Not doing so while senior management demands the use of AI augmentation seems odd.

neutronicus · 2026-02-04T02:08:18 1770170898

It's a 25-year-old CAD application written in very non-standard C++. I doubt it.

Certainly I have tried to accomplish tasks giving Claude guidance far outstripping "a modicum".

SgtBastard · 2026-02-02T02:28:11 1769999291

Only if you’re relying upon the models to recall facts from its training set - intuitively, at sufficient complexity, models ability to reason is what is critical and can have its answers kept up to date with RAG.

Unless you mean out of date == no longer SOTA reasoning models?

danaris · 2026-02-02T09:06:28 1770023188

If you're using the models to assist with coding—y'know, what this thread is about?—then they'll need to know about the language being used.

If you're using them for particular frameworks or libraries in that language, they'll need to know about those, too.

If training becomes uneconomical, new advances in any of these will no longer make it into the models, and their "help" will get worse and worse over time, especially in cutting-edge languages and technologies.

SgtBastard · 2026-02-04T19:46:27 1770234387

This thread is about coding agents, of which a model is only one (important) part.

somewhereoutth · 2026-02-02T11:36:54 1770032214

'ability to reason' implies that LLMs are building a semantic model from their training data, whereas the simplest explanation for their behavior is that they are building a syntactic model (see Plato's Cave). Thus without new training they cannot 'learn', RAG or no RAG.

SgtBastard · 2026-02-04T19:50:29 1770234629

We have multiple threads of research demonstrating in-context learning, friend.

https://github.com/dqxiu/ICL_PaperList

HN For You