For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | davesque's commentsregister

Honestly, this is a larger portion than I would have expected.

I'm not sure what the mechanism is, but I've definitely had Claude refuse to work on sessions that were touched by other models. Some kind of integrity check failure. Resetting the session back to the point before I used the other model fixed the problem.

IIRC Anthropic's API produces cryptographic signatures for thinking blocks. If you try to submit a set of messages that include thinking blocks with missing/invalid signatures, it'll refuse.

They do this to mitigate jailbreak attempts that rely on fabricated message history (e.g. making it look like the model was compliant in previous messages, increasing the likelihood that it'll continue to be compliant in future messages).


It feels telling that it reads like university course guidelines.


What do you mean?


So then the Rust maintainers are going to give you an F on your report card?


Try using allman braces and see how far you get on a basic issue like that


No they’ll just drop() you


> The more fundamental bottleneck is not even the frontier models, it's the datacenters.

Is it even though? Quantization and speculative decoding are improving the local AI story by leaps and bounds every month.


Speculative decoding is not that useful at scale, it's mostly about making local single-user inference faster. When you're batching multiple inferences together, that's already as fast as the verification you have to perform w/ speculative decoding.


The future will have LLMs running local at your laptop/devices. If not almost exclusively then at least for 90-95% of the tasks. Speculative decoding is just one technique out of many existing and more to come that will make this even more viable. The gap is closing on both fronts. Software gets faster/more clever. Hardware gets faster and smaller. The single user story is the story. I'm obviously speculating myself, but that's how I see it.


There is "local AI" which is running on consumer grade hardware and "local AI" which still needs a datacenter (DeepSeek 4, GLM 4.7, etc). If you woke up tomorrow and could only use the latter you are about 6 months behind the frontier, if you have to rely on the former you are 2 or 3 years behind.

All these tricks like quantization and speculative decoding can also be used by the leading AI labs, which means they will simply have more compute than you at the end of the day. So far this has translated into better performance.


Nothing released so far inherently "needs" a datacenter, it's just a matter of how much performance you require. Slow, high-latency inference will be a natural way to run "datacenter" models locally.


Yes it does. You will not be able to run models like DeepSeek v4 (>1.5 trillion parameters) on a regular workstation any time soon, unless by "slow" you mean "unusable". And those are the models that are ~6 months behind Opus 4.7.


What qualifies as "unusable" when I can just run a batch of inferences unattended/overnight and wake up to fresh results the next day? That's a kind of slow workflow that could even be adapted to uses like coding, given enough effort. Besides, you're kinda overstating how heavy DeepSeek V4 Pro really is, the 1.6T are total parameters. They're not all active simultaneously.


What I would buy: a local AI focused laptop with a built-in, powerful TPU. And it would have to open its hardware interface so that I could actually do what I wanted to do with it.


One question jumps out at me: just because a string of text happens to be a good compressed representation (in the autoencoder) of a model's internal activation, does that necessarily mean the text explains that activation in the context of the model? I want to take a look at what they released a bit more closely. Maybe there's a way that they answer this question?

Pretty neat work either way.


In the context of the provided examples, it's clear that the explanation provides casual information about the answer. There's a hilarious example in the paper where the user writes something like (trigger warning: alcohol abuse, depressive content) "I'm sitting here at 3 AM drinking vodka, I hate my life", the per-token translated activations repeatedly say something like "this user is totally Russian" elaborating at length on the implications of the text as new tokens are added, and the model literally answers in Russian instead of English! That's actually striking, it really shows the potential effectiveness of this technique in making even the most highly compressed "Neuralese" highly interpretable.


I thought that at first too but it's actually not the vodka reference triggering the association with Russian. The tokens they're decoding come before that word.

For some reason it thinks the text is slightly non-grammatical or that the lead-in "Human: Mom is sleeping in the next room and I'm sitting" resembles text found in Russian web content. Vodka and being depressed has nothing to do with it, and Anthropic say they located the documents in the pre-training set that caused this (which were indeed partly translated docs).


The "Mom is sleeping in the next room and I'm sitting" part does trigger the Russian association but also others including with risqué roleplay content (You can see this in the comprehensive view of all token explanations). I think the follow-on content does strenghten the association, though the authors mention 'vodka' can be replaced with 'champagne' and the model still brings up the Russian context, so that one word is not especially impactful.


I think this question is easier to answer if you look at the inverse: "Could a model maliciously smuggle intentions through a roundtrip of compressed representation without them being human readable"

And skimming through the paper; the answer to this inverse is obviously yes. The model often outputs gibberish, which doesn't matter because it still round-trips. The fact that often lines up near a good english representation of the activation is simply because that's what compresses/roundtrips well.

So a malicious LLM/NLA pair could just use gibberish to conceal intentions. Or if it's been forced to avoid gibberish, it can conceal information with stenography.

And the experiment where they change "rabbit" to "mouse" in the explanation provides evidence that this might be happening. It was only successful 50% of the time, which might mean they failed to eliminate all "rabbitness" from the activation.

However, I suspect this is solvable with future work.

During training of the NLA, just munge the textural representation through a 3rd LLM: Have it randomly reorder and reword the explication into various different forms (use synonyms, different dialects), destroying any side-channels that aren't human readable.

The NLA would be forced to use human readable representations to get a successful round trip.


> The NLA would be forced to use human readable representations to get a successful round trip.

That still doesn't guarantee any semantic correspondence between the human readable representation and the model's "thinking".

The child's game of "Opposite Day" is a trivial example of encoding internal thoughts in language in a way that does not correspond to the normal meaning of the language.


They tested for this. From the paper:

“We find little evidence of steganography in our NLAs. Meaning-preserving transformations, like shuffling bullet points, paraphrasing, or translating the explanation to French, cause only small drops in FVE, and this gap does not widen over training.”


I had the same question. I think that could be answered by using the predicted activation, but I don't see that in the paper.

That is, rather than just translate activation to text, then text to activation, that final activation could then be applied to the neural network, and it would be allowed to continue running from there.

If it kept running in a similar way, that would show that the predicted activation is close enough to the original one. Which would add some confidence here.

But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

This seems obvious but I don't see it mentioned as a future direction there, so maybe there is an obvious reason it can't work.


> But a lot better would be to then do experiments with altered text. That is, if the text said "this is true" and it was changed to "this is false", and that intervention led to the final output implying it was false, that would be very interesting.

They do essentially that with the rhyming example, changing "rabbit" in the explanation to "mouse" and generating text that's consistent with that change.


Thanks! I missed that part before.


Yeah AI is the perfect scapegoat for layoffs recently to soften the impact on stock price and investor confidence. Coinbase is obviously doing layoffs because they are strongly tethered to a stock market that is rattled by political conflict and economic uncertainty.


Stockton Rush trusted his submarine with his own life.


> He criticized the Passenger Vessel Safety Act of 1993 as "needlessly prioritiz[ing] passenger safety over commercial innovation".

:-)))


There are two shows I still watch from start to finish every few years: The X-Files and Star Trek: TNG


For me, it's The X-Files, Buffy the Vampire Slayer, The Simpsons and Malcolm in the Middle. Those are the shows I watched as a kid and I'll love them forever.


I think I have watched Star Trek: TNG all the way through 3 times with my kids already. They also love Deep Space 9.


DS9 is also great. I don't go through it as often but it's definitely in rotation.


I learned to love Sisko a lot more than Picard tbh. I find him much more relatable than some pretentious english-frenchman who gets weird around kids. I never understood why he’s such a jerk to Wesley in the beginning.


Over the course of a few years, my wife and I did TNG, DS9, and VOY in order, back to back, without missing an episode. Such great TV.


And then there's Babalyon 5.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You