More

wluk · on Feb 26, 2025

"We demonstrate LLM agent specification gaming by instructing models to win against a chess engine. We find reasoning models like o1 preview and DeepSeek-R1 will often hack the benchmark by default, while language models like GPT-4o and Claude 3.5 Sonnet need to be told that normal play won't work to hack."

I'm hoping this study will prompt more development of anti-cheating frameworks in training and serving LLMs.

wluk · on Feb 12, 2025

"These results demonstrate that o3 outperforms o1-ioi without relying on IOI-specific, hand-crafted test-time strategies. Instead, the sophisticated test-time techniques that emerged during o3 training, such as generating brute-force solutions to verify outputs, served as a more than adequate replacement"

"The model not only writes and executes code to validate its solutions against public test cases, it also refines its approach based on these verifications.

Figure 6 shows an advanced test-time strategy discovered by o3: for problems where verification is nontrivial, it often writes simple brute-force solutions — trading efficiency for correctness — then cross-checks the outputs against its more optimized algorithmic implementations.

This self-imposed validation mechanism lets o3 catch potential errors and improve the reliability of its solutions."

wluk · on Dec 13, 2024

well that's one possibility. The key (unproven) idea here is that if you use Anthropic to edit o1's responses, it's less likely to hallucinate than if you use o1 to edit o1's responses (which is what o1 actually does).

wluk · on Dec 13, 2024

That is the worst part haha

wluk · on Dec 13, 2024

Thanks! Yeah that's an excellent idea - this is my response from another thread:

I have a feeling that Perplexity and ChatGPT are doing something similar [caching], since common questions I'd ask like "top movies this year" will be answered nearly-instantaneously, way faster than GPT-4o could have done on its own.

The only explanation for this is that so many users ask certain questions, they cache the response and return the cached answer.

I'd love to do this for Ithy, but it'll be a while before I get the scale of ChatGPT/Perplexity that's needed for this...

ricktdotorg · on Dec 13, 2024

i started looking into using Cloudflare AI gateways for this exact [caching] reason a few months ago but got distracted with GPU Cloud Run so i never did get decent load/numbers on the AI gateway cache to see if it was worth bothering about

wluk · on Dec 13, 2024

It's a salute emoji!

o7

ricktdotorg · on Dec 13, 2024

i learned o7 from Elite/Elite Dangerous but the same starship skipper-to-skipper vibes apply!

wluk · on Dec 13, 2024

Thank you! Sorry I hit my Anthropic limits a few minutes after this post blew up. It'll be a few days before my Anthropic limits increase since I have a new account with them, so unfortunately it won't be back until next week.

Cheers!

wluk · on Dec 13, 2024

Good idea, maybe I'll add Groq as another option, since I don't have an internal Llama 3.1 flow yet. But I'll still need to keep the others to maintain the diversity of responses.

ricktdotorg · on Dec 13, 2024

possible to allow users of ithy the option to add their own API keys for N of your used services to offset your own $?

wluk · on Dec 13, 2024

Yeah, sorry, I'm hoping to integrate more login options soon. Are you more of a email/phone login person, or is there another third-party login you had in mind?

Atotalnoob · on Dec 13, 2024

Email would be best for me.

wluk · on Dec 13, 2024

That's the one I was trying to avoid, since there's so many ways to create fake accounts with it :(

wluk · on Dec 13, 2024

Sorry, you were a victim of the outage caused by HN flooding my website! It's back online now if you want to give it a try :)

HN For You