More

davesque · 2026-06-02T01:42:08 1780364528

Honestly, this is a larger portion than I would have expected.

davesque · 2026-05-25T21:41:00 1779745260

I'm not sure what the mechanism is, but I've definitely had Claude refuse to work on sessions that were touched by other models. Some kind of integrity check failure. Resetting the session back to the point before I used the other model fixed the problem.

benjamincburns · 2026-05-26T00:05:40 1779753940

IIRC Anthropic's API produces cryptographic signatures for thinking blocks. If you try to submit a set of messages that include thinking blocks with missing/invalid signatures, it'll refuse.

They do this to mitigate jailbreak attempts that rely on fabricated message history (e.g. making it look like the model was compliant in previous messages, increasing the likelihood that it'll continue to be compliant in future messages).

davesque · 2026-05-15T08:09:11 1778832551

It feels telling that it reads like university course guidelines.

dgellow · 2026-05-15T10:29:56 1778840996

What do you mean?

davesque · 2026-05-15T08:02:58 1778832178

So then the Rust maintainers are going to give you an F on your report card?

bcjdjsndon · 2026-05-15T11:00:07 1778842807

Try using allman braces and see how far you get on a basic issue like that

aabhay · 2026-05-15T09:08:49 1778836129

No they’ll just drop() you

davesque · 2026-05-15T07:58:24 1778831904

> The more fundamental bottleneck is not even the frontier models, it's the datacenters.

Is it even though? Quantization and speculative decoding are improving the local AI story by leaps and bounds every month.

zozbot234 · 2026-05-15T08:01:44 1778832104

Speculative decoding is not that useful at scale, it's mostly about making local single-user inference faster. When you're batching multiple inferences together, that's already as fast as the verification you have to perform w/ speculative decoding.

peheje · 2026-05-15T08:53:55 1778835235

The future will have LLMs running local at your laptop/devices. If not almost exclusively then at least for 90-95% of the tasks. Speculative decoding is just one technique out of many existing and more to come that will make this even more viable. The gap is closing on both fronts. Software gets faster/more clever. Hardware gets faster and smaller. The single user story is the story. I'm obviously speculating myself, but that's how I see it.

pu_pe · 2026-05-15T10:00:58 1778839258

There is "local AI" which is running on consumer grade hardware and "local AI" which still needs a datacenter (DeepSeek 4, GLM 4.7, etc). If you woke up tomorrow and could only use the latter you are about 6 months behind the frontier, if you have to rely on the former you are 2 or 3 years behind.

All these tricks like quantization and speculative decoding can also be used by the leading AI labs, which means they will simply have more compute than you at the end of the day. So far this has translated into better performance.

zozbot234 · 2026-05-15T10:47:46 1778842066

Nothing released so far inherently "needs" a datacenter, it's just a matter of how much performance you require. Slow, high-latency inference will be a natural way to run "datacenter" models locally.

pu_pe · 2026-05-15T11:13:47 1778843627

Yes it does. You will not be able to run models like DeepSeek v4 (>1.5 trillion parameters) on a regular workstation any time soon, unless by "slow" you mean "unusable". And those are the models that are ~6 months behind Opus 4.7.

zozbot234 · 2026-05-15T11:22:51 1778844171

What qualifies as "unusable" when I can just run a batch of inferences unattended/overnight and wake up to fresh results the next day? That's a kind of slow workflow that could even be adapted to uses like coding, given enough effort. Besides, you're kinda overstating how heavy DeepSeek V4 Pro really is, the 1.6T are total parameters. They're not all active simultaneously.

davesque · 2026-05-13T00:23:04 1778631784

What I would buy: a local AI focused laptop with a built-in, powerful TPU. And it would have to open its hardware interface so that I could actually do what I wanted to do with it.

davesque · 2026-05-07T19:57:19 1778183839

One question jumps out at me: just because a string of text happens to be a good compressed representation (in the autoencoder) of a model's internal activation, does that necessarily mean the text explains that activation in the context of the model? I want to take a look at what they released a bit more closely. Maybe there's a way that they answer this question?

Pretty neat work either way.

zozbot234 · 2026-05-07T20:21:08 1778185268

In the context of the provided examples, it's clear that the explanation provides casual information about the answer. There's a hilarious example in the paper where the user writes something like (trigger warning: alcohol abuse, depressive content) "I'm sitting here at 3 AM drinking vodka, I hate my life", the per-token translated activations repeatedly say something like "this user is totally Russian" elaborating at length on the implications of the text as new tokens are added, and the model literally answers in Russian instead of English! That's actually striking, it really shows the potential effectiveness of this technique in making even the most highly compressed "Neuralese" highly interpretable.

mike_hearn · 2026-05-08T09:28:12 1778232492

I thought that at first too but it's actually not the vodka reference triggering the association with Russian. The tokens they're decoding come before that word.

For some reason it thinks the text is slightly non-grammatical or that the lead-in "Human: Mom is sleeping in the next room and I'm sitting" resembles text found in Russian web content. Vodka and being depressed has nothing to do with it, and Anthropic say they located the documents in the pre-training set that caused this (which were indeed partly translated docs).

zozbot234 · 2026-05-08T11:16:46 1778239006

The "Mom is sleeping in the next room and I'm sitting" part does trigger the Russian association but also others including with risqué roleplay content (You can see this in the comprehensive view of all token explanations). I think the follow-on content does strenghten the association, though the authors mention 'vodka' can be replaced with 'champagne' and the model still brings up the Russian context, so that one word is not especially impactful.

phire · 2026-05-08T00:45:04 1778201104

I think this question is easier to answer if you look at the inverse: "Could a model maliciously smuggle intentions through a roundtrip of compressed representation without them being human readable"

And skimming through the paper; the answer to this inverse is obviously yes. The model often outputs gibberish, which doesn't matter because it still round-trips. The fact that often lines up near a good english representation of the activation is simply because that's what compresses/roundtrips well.

So a malicious LLM/NLA pair could just use gibberish to conceal intentions. Or if it's been forced to avoid gibberish, it can conceal information with stenography.

And the experiment where they change "rabbit" to "mouse" in the explanation provides evidence that this might be happening. It was only successful 50% of the time, which might mean they failed to eliminate all "rabbitness" from the activation.

However, I suspect this is solvable with future work.

During training of the NLA, just munge the textural representation through a 3rd LLM: Have it randomly reorder and reword the explication into various different forms (use synonyms, different dialects), destroying any side-channels that aren't human readable.

The NLA would be forced to use human readable representations to get a successful round trip.

dontlikeyoueith · 2026-05-08T02:15:50 1778206550

> The NLA would be forced to use human readable representations to get a successful round trip.

That still doesn't guarantee any semantic correspondence between the human readable representation and the model's "thinking".

The child's game of "Opposite Day" is a trivial example of encoding internal thoughts in language in a way that does not correspond to the normal meaning of the language.

chilmers · 2026-05-08T10:02:07 1778234527

They tested for this. From the paper:

“We find little evidence of steganography in our NLAs. Meaning-preserving transformations, like shuffling bullet points, paraphrasing, or translating the explanation to French, cause only small drops in FVE, and this gap does not widen over training.”

azakai · 2026-05-07T23:28:52 1778196532

I had the same question. I think that could be answered by using the predicted activation, but I don't see that in the paper.

That is, rather than just translate activation to text, then text to activation, that final activation could then be applied to the neural network, and it would be allowed to continue running from there.

If it kept running in a similar way, that would show that the predicted activation is close enough to the original one. Which would add some confidence here.

But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

This seems obvious but I don't see it mentioned as a future direction there, so maybe there is an obvious reason it can't work.

zozbot234 · 2026-05-07T23:32:04 1778196724

> But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

They do essentially that with the rhyming example, changing "rabbit" in the explanation to "mouse" and generating text that's consistent with that change.

azakai · 2026-05-08T00:36:55 1778200615

Thanks! I missed that part before.

davesque · 2026-05-06T00:26:51 1778027211

Yeah AI is the perfect scapegoat for layoffs recently to soften the impact on stock price and investor confidence. Coinbase is obviously doing layoffs because they are strongly tethered to a stock market that is rattled by political conflict and economic uncertainty.

davesque · 2026-05-02T00:48:24 1777682904

Stockton Rush trusted his submarine with his own life.

oblio · 2026-05-02T07:25:04 1777706704

> He criticized the Passenger Vessel Safety Act of 1993 as "needlessly prioritiz[ing] passenger safety over commercial innovation".

:-)))

davesque · 2026-05-01T19:28:14 1777663694

There are two shows I still watch from start to finish every few years: The X-Files and Star Trek: TNG

mna_ · 2026-05-02T08:49:07 1777711747

For me, it's The X-Files, Buffy the Vampire Slayer, The Simpsons and Malcolm in the Middle. Those are the shows I watched as a kid and I'll love them forever.

dustfinger · 2026-05-01T19:40:22 1777664422

I think I have watched Star Trek: TNG all the way through 3 times with my kids already. They also love Deep Space 9.

davesque · 2026-05-01T19:41:57 1777664517

DS9 is also great. I don't go through it as often but it's definitely in rotation.

i_am_a_peasant · 2026-05-01T21:28:59 1777670939

I learned to love Sisko a lot more than Picard tbh. I find him much more relatable than some pretentious english-frenchman who gets weird around kids. I never understood why he’s such a jerk to Wesley in the beginning.

ryandrake · 2026-05-01T22:09:28 1777673368

Over the course of a few years, my wife and I did TNG, DS9, and VOY in order, back to back, without missing an episode. Such great TV.

fragmede · 2026-05-02T00:41:26 1777682486

And then there's Babalyon 5.

HN For You