For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | wluk's commentsregister

"We demonstrate LLM agent specification gaming by instructing models to win against a chess engine. We find reasoning models like o1 preview and DeepSeek-R1 will often hack the benchmark by default, while language models like GPT-4o and Claude 3.5 Sonnet need to be told that normal play won't work to hack."

I'm hoping this study will prompt more development of anti-cheating frameworks in training and serving LLMs.


"These results demonstrate that o3 outperforms o1-ioi without relying on IOI-specific, hand-crafted test-time strategies. Instead, the sophisticated test-time techniques that emerged during o3 training, such as generating brute-force solutions to verify outputs, served as a more than adequate replacement"

"The model not only writes and executes code to validate its solutions against public test cases, it also refines its approach based on these verifications.

Figure 6 shows an advanced test-time strategy discovered by o3: for problems where verification is nontrivial, it often writes simple brute-force solutions — trading efficiency for correctness — then cross-checks the outputs against its more optimized algorithmic implementations.

This self-imposed validation mechanism lets o3 catch potential errors and improve the reliability of its solutions."


well that's one possibility. The key (unproven) idea here is that if you use Anthropic to edit o1's responses, it's less likely to hallucinate than if you use o1 to edit o1's responses (which is what o1 actually does).


That is the worst part haha


Thanks! Yeah that's an excellent idea - this is my response from another thread:

I have a feeling that Perplexity and ChatGPT are doing something similar [caching], since common questions I'd ask like "top movies this year" will be answered nearly-instantaneously, way faster than GPT-4o could have done on its own.

The only explanation for this is that so many users ask certain questions, they cache the response and return the cached answer.

I'd love to do this for Ithy, but it'll be a while before I get the scale of ChatGPT/Perplexity that's needed for this...


i started looking into using Cloudflare AI gateways for this exact [caching] reason a few months ago but got distracted with GPU Cloud Run so i never did get decent load/numbers on the AI gateway cache to see if it was worth bothering about


It's a salute emoji!

o7


i learned o7 from Elite/Elite Dangerous but the same starship skipper-to-skipper vibes apply!


Thank you! Sorry I hit my Anthropic limits a few minutes after this post blew up. It'll be a few days before my Anthropic limits increase since I have a new account with them, so unfortunately it won't be back until next week.

Cheers!


Good idea, maybe I'll add Groq as another option, since I don't have an internal Llama 3.1 flow yet. But I'll still need to keep the others to maintain the diversity of responses.


possible to allow users of ithy the option to add their own API keys for N of your used services to offset your own $?


Yeah, sorry, I'm hoping to integrate more login options soon. Are you more of a email/phone login person, or is there another third-party login you had in mind?


Email would be best for me.


That's the one I was trying to avoid, since there's so many ways to create fake accounts with it :(


Sorry, you were a victim of the outage caused by HN flooding my website! It's back online now if you want to give it a try :)


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You