More

tobr · 2026-03-12T07:44:26 1773301466

I wonder why they fail this specific way. If you just let them do stuff everything quickly turns spaghetti. They seem to overlook obvious opportunities to simplify things or see a pattern and follow through. The default seems to be to add more, rather than rework or adjust what’s already in place.

samdjstephens · 2026-03-12T08:46:15 1773305175

I suspect it has something to do with a) the average quality of code in open source repos and b) the way the reward signal is applied in RL post-training - does the model face consequences of a brittle implementation for a task?

I wonder if these RL runs can extend over multiple sequential evaluations, where poor design in an early task hampers performance later on, as measured by amount of tokens required to add new functionality without breaking existing functionality.

foo42 · 2026-03-12T11:04:41 1773313481

Yeah I've been wondering if the increasing coding RL is going to draw models towards very short term goals relative to just learning from open source code in the wild

catlifeonmars · 2026-03-12T12:58:07 1773320287

To me this seems like a natural consequence of the next-token prediction model. In one particular prompt you can’t “backtrack” once you’ve emitted a token. You can only move forwards. You can iteratively refine (e.g the agent can one shot itself repeatedly), but the underlying mechanism is still present.

I can’t speak for all humans, but I tend to code “nonlinearly”, jumping back and forth and typically going from high level (signatures, type definitions) to low level (fill in function bodies). I also do a lot of deletion as I decide that actually one function isn’t needed or if I find a simpler way to phrase a particular section.

Edit: in fact thinking on this more, code is _much_ closer to a tree than sequence of tokens. Not sure what to do with that, except maybe to try a tree based generator which iteratively adds child nodes.

tobr · 2026-03-12T15:06:59 1773328019

This would make sense to me as an explanation when it only outputs code. (And I think it explains why code often ends up subtly mangled when moved in a refactoring, where a human would copy paste, the agent instead has to ”retype” it and often ends up slightly changing formatting, comments, identifiers, etc.)

But for the most part, it’s spending more tokens on analysis and planning than pure code output, and that’s where these problems need to be caught.

catlifeonmars · 2026-03-14T15:48:20 1773503300

I feel like planning is also inherently not sequential. Typically you plan in broad strokes, then recursively jump in and fill in the details. On the surface it doesn’t seem to be all that much different than codegen. Code is just more highly specified planning. Maybe I’m misunderstanding your point?

OtomotO · 2026-03-12T07:48:36 1773301716

All it does is generate soup. Some of which may taste good.

There is no thinking, no matter what marketing tells you.

Antibabelic · 2026-03-12T09:07:12 1773306432

LLMs are next token predictors. Their core functionality boils down to simply adding more stuff.

logicchains · 2026-03-12T12:29:13 1773318553

They do what you tell them to. If you regularly tell them to look for opportunities to clean up/refactor the code, they will.

tobr · 2026-03-11T11:06:28 1773227188

I also expected hardware to be involved. But in the context of a list of tutorials on how to use this live coding tool the title makes sense though.

tobr · 2026-03-09T10:21:57 1773051717

And at the same time, the fastest growing consumer product of all time is called ”ChatGPT”.

jl6 · 2026-03-09T10:47:36 1773053256

Perhaps if the product is compelling enough, the name doesn’t matter - and conversely, if the product is borderline, it had better have a great name.

jmogly · 2026-03-09T11:08:20 1773054500

Chat gpt is a great name though — you “chat” with the “GPT” so its self informing (even if you dont know what a GPT is), it’s 4 syllables that roll off the tongue well together.

RSS, has no vowels, no information, and looks like an alphabet term you might see at the doctor’s office or in an HR onboarding form at a corpo.

wiether · 2026-03-09T15:03:05 1773068585

Randos are just calling it "Chat" now.

"I'll ask Chat about x!"

msephton · 2026-03-10T01:28:53 1773106133

In Japan it's now known colloquially as 「チャッピー」 ("Chappy" or "Chappie"). High praise that it has received such shortened and personified version so quickly.

tobr · 2026-03-09T15:15:56 1773069356

It’s the new ”I looked it up on wiki”.

youniverse · 2026-03-09T23:02:13 1773097333

I've heard 'just ai it' from high schoolers.

tobr · 2026-03-04T10:32:10 1772620330

As a European, my impression is that things named something something ”Euro” tend to be cheap and low quality. I don’t think it’s possible to build a positive consumer brand around ”Eurosky”. I support the cause though - we probably need to find a catchy word like ”Brexit” or ”enshittification” to make it salient.

wongarsu · 2026-03-04T10:41:01 1772620861

This is almost universally true for every national identity (or however we want to widen the term to include Euro).

If you have a good product, you usually lead with that. "Made in X" becomes one bullet point in the list of things that make you great. If you lead with "made in X" or even make that your entire brand, that's a sign that you probably don't have much else to bring to the table.

The only real exception are foods and beverages. And even there it's questionable

SiempreViernes · 2026-03-04T10:41:08 1772620868

> Eurosky is a pan-European initiative spearheaded by a coalition of entrepreneurs, technologists and civil society organizations

A brit, a belgian and a german by the looks of their profiles, which are just their linkedin pages.

Posting this to HN feels like some guys trying to do "growth hacking" with Brusselian characteristics.

Honestly I even propose this conjecture: If you are in Europe you will learn about any truly European social media from some other source long before it appears on HN.

philipallstar · 2026-03-04T10:34:58 1772620498

Elevator pitch could be "Wirecard for Social Media".

lynx97 · 2026-03-04T10:38:24 1772620704

When I read "Eurosky", Skyshield immediately came to mind. Sounds like a military project.

tobr · 2026-03-03T05:39:24 1772516364

For the record, now it has changed again, to ’Meta’s AI smart glasses and data privacy concerns’, which is even more milquetoast.

Parent and another comment reacting to this change have also been (artificially, I must assume) sunk from top to below gems like ’Too funny that the subcontractor working for meta is “sama”’.

tobr · 2026-03-01T00:20:19 1772324419

> Be the change, my man. Try to make a podcast.

This might be the funniest thing I’ve read today.

tobr · 2026-03-01T00:04:14 1772323454

Are you asking what I do or are you asking for advice on what you should do?

NamlchakKhandro · 2026-03-01T00:17:17 1772324237

re-thc · 2026-03-01T00:08:03 1772323683

Sounds like neither. More like throw a dice and press the up button.

tobr · 2026-02-28T06:37:14 1772260634

I had been considering ditching everyday ChatGPT use in favor of Claude anyway, but hadn’t gotten around to it mostly out of habit. Now I have a good reason to do it.

tombert · 2026-02-28T07:00:10 1772262010

Same, I had put Claude in my metaphorical shopping cart about two weeks ago but I already had some inertia with ChatGPT + Codex and figured it wouldn't be better enough to justify changing.

That has changed, so I canceled my ChatGPT membership and signed up for Claude. I still have five bucks of credit I bought a year ago for the OpenAI API that I do not believe I can have refunded back, so some of my apps are going to have to stick to OpenAI until those credits run out since I'm not going to just donate five bucks to them.

Playing with it now, I honestly can't tell too much of a difference, which as far as I am concerned is a good thing.

moffkalast · 2026-02-28T08:29:28 1772267368

With this amount of competition it's almost weird to be paying anyone anything when one can just switch between free tiers of GPT, Claude, Gemini, Kimi, Qwen, Deepseek, Le Chat and an endless firehose of local models. The more your usage is randomly spread out, the less each provider can presumably profile you too as nobody has the full picture.

grey-area · 2026-02-28T07:32:11 1772263931

You should also consider ollama and local models.

timpera · 2026-02-28T07:28:17 1772263697

Consider carefully the usage limits of both services before deleting your account (as you cannot create a new one later with the same email). Claude's €20/month sub offers very little and this has unfortunately kept me from switching when I tried earlier this month.

idiotsecant · 2026-02-28T07:38:29 1772264309

I get a fair amount of use out of it. I'm not using it for professional software development, just hobby stuff that I don't to write the boring parts of. For 20 bucks a month that seems pretty reasonable.

tobr · 2026-02-28T07:33:47 1772264027

I had been using both, ChatGPT mostly for chats and Claude mostly for code. Now I cancelled the ChatGPT subscription and turned on extra usage in Claude instead.

deaux · 2026-02-28T07:32:52 1772263972

Consider that "little" is very subjective here, I find it to offer more than enough.

tobr · 2026-02-12T13:30:19 1770903019

Say more. What did you see?

mathisfun123 · 2026-02-12T15:04:16 1770908656

Apple is the only place I've ever worked where I really feel I'll get summarily fired for saying the wrong thing during a meeting or in my pod (if a manager overhears) - the culture is that draconian. So don't hold your breath waiting for someone to tell you things they're under NDA about (or just general litigation pressure).

tobr · 2026-02-12T12:23:12 1770898992

Their hardware and integration between hardware and software is extremely good, still. The software on its own has gone downhill for a long time.

HN For You