For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | andai's commentsregister

You mean you were getting more than $130 per $20 before?

85% discount is actually a bit lower than I remember. I think it used to be closer to 90-95%. They're getting stingy ;)


I think it was around $400-$500 last year ($20 a day was fairly common) before they added the 7 day limits (and have since slashed the 4 hour limits).

No parallel running; I would very consistently get tokens for over 3 hours then take a walk around the block and come back and be ready to go again.


I tried to get GPT to talk like a regular guy yesterday. It was impossible for it to maintain adherence. It kept defaulting back to markdown and bullet points, after the first message. (Funny cause it scores highest on the instruction following benchmarks.)

Might seem trivial but if it can't even do a basic style prompt... how are you supposed to trust it with anything serious?


Yeah, GPT also constantly misattributes things.

OpenAI have some kinda 5 tier content hierarchy for OpenAI (system prompt, user prompt, untrusted web content etc). But if it doesn't even know who said what, I have to question how well that works.

Maybe it's trained on the security aspects, but not the attribution because there's no reward function for misattribution? (When it doesn't impact security or benchmark scores.)


+1 This needs to exist if it doesn't yet!

Maybe an issue would be people not all having the same type of hardware though? Maybe you target an emulator. (Some Fantasy Consoles sort of count here?)

I haven't looked expensively but some of the retro themed jams were missing the "spirit" I was expecting.

I did a Nokia jam a while back — monochrome, beeps — and I remember being kind of annoyed that the rules technically allowed 3D Unity games as long as they followed resolution and color palette.

(A 3D cube spinning on a TI calculator is a different matter ;)


>Some Fantasy Consoles sort of count here?

They definitely do. I recommend GP check out PICO-8 which has some VERY real games on it like the original Celeste (by its original creators), Cattle Crisis, POOM, Combo Pool, Into Ruins, Dank Tomb, UFO Swamp Odyssey, Porklike, and much more. Most of which you can play on Itch.io for free in your browser.

I’ve been having a blast making a “real” and very full-featured PICO-8 game to serve as a “market fit” prototype — if a PICO-8 game on Itch gets meaningful attention, I’ve “found the fun” and therefore I should make “the full version” (non-PICO-8) for Steam, etc.


Yeah, I imagine a target emulator is the way to go for this kind of thing.

Speaking of your last comment: while very impressive, I feel a bit disappointed when someone's done something amazing with a Game Boy or SNES or whatnot, but the solution involves shoving an entire computer in the cartridge. This is still very cool but your console just becomes a head unit for your GTX 4080 or whatnot.


That made me somewhat disappointed back in the day too, when I realized that some games had extra sound circuitry or even an extra CPU in them.

The labs started doing that in late 2024, they all published research on it.

Curiously, mid 2025, they all simultaneously implemented increasingly bizarre restrictions on "self replication". I don't think there was anything public but it sure sounds like something spooked them. (Or maybe just taking sensible precautions, given the direction of the whole endeavour.)

At any rate, I recently asked Opus about "Did PKD know about living information systems?" and the safety filter ended the conversation. It started answering me, and then it's response was deleted and a red warning box popped up.

But notably, I was given the option to continue the chat with a dumber model (presumably one less capable of producing whatever it thinks I meant by that phrase).

Also, I told GPT-5 about my self-modifying Python AI programmer, and it became extremely uncomfortable. I told it an older version of itself had designed and built it (GPT-4 in 2023), and it didn't like that at all! So something's definitely changed in the safety training there.


The weird position they find themselves in now is that they have to keep making it smarter... but they already made it too smart (Mythos). I'm not sure how that's going to work out exactly.

They find an arbitrary intelligence cutoff point between Opus and Mythos, label it "acceptable risk", and then the labs coordinate to gradually nudge that line forward and hope the internet doesn't break?


> but they already made it too smart (Mythos).

It's largely a marketing tactic. It will be released, and it won't be long before other models show similar capabilities.

If they wanted they could add guardrails. The scales required to brute force search for vulnerabilities like they did would be very identifiable.


Scam Altman already pulled this trick numerous times.

Whats wrong with people? Is it really that hard to see the truth?


I think we will see unbundling of large model into submodels: modular, smaller and efficient, only include what you need eg a CUA model, a reasoning model, a legal model, a writing model, a coding model (this could get subdivided into different languages). That way you only update that submodel which needs retraining.

Well all of them are already in bed with the government, so they're going to find themselves with slightly more assistance than a free market would predict.

If they somehow do fail, then the output of that process will be fantastic open weight models (and hopefully some leaks). I want to say those will pay dividends for decades... but a better prediction is that they will be obsolete within three months ;)


Is there a benchmark for these long tasks? That kind of seems like the only number worth measuring.

(Of course at that point it involves memory and context management and so on, so you're testing the harness as well as the model.)


Catplus?

Edit: Yandex can search for it! But doesn't seem to find anything relevant.

(It also hates such queries and will force you to wait 2 minutes for a captcha to load.. but you get the results after a long wait! As our forefathers once did!)

I did find C@ and C@++ though.

https://esolangs.org/wiki/C@%2B%2B


So like, TSMC, but syndicalist?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You