Exactly. Being precise about logarithmic vs linear utilizations is key here. I tried making a similar point about the inefficiency of IEEE-754 redundant NaN encodings here: https://arxiv.org/pdf/2508.05621
Having ~a quadrillion redundant bitstrings all mapping to NaN sounds pretty bad, but logarithmic/information utilization-wise, this is actually not too bad.
I've been looking at the 8087 NaN circuitry lately. Having 2^53 (or whatever) values for NaN was supposedly a feature: "the large number of NAN values that are available, provide the sophisticated programmer with a tool that can be applied to a variety of special situations." For example, the different NaN values could hold debugging information to track down errors.
You'll see some scripting languages (ab)use this. Where the native "number" type is a 64 bit float and only one NaN bit pattern is a real NaN. The others smuggle a pointer to an object in the lower bits. This way you don't spend any memory overhead indicating if a given variable contains a primitive or an object.
I feel like many of the comments missed the point or didn't read the article. What I believe this article is stating (and I've read this many times during my PhD for various reasons), is that the input data distributions affect how many transistor state changes there are during multiplication. Since these events are a large portion of energy loss/heat generation, the clocks won't be throttled as much for certain data patterns.
There was a workshop paper from SC24 that did more experiments around this I believe. I can't find it now though.
Yeah I'm not sure how to put this into words either, but I feel you. Do you know of any good AI "thought-leader" that's sensible and not AI-pilled or a complete doomer? Some reasonable middle-ground that helps us navigate an era of immense technological innovation?
Having ~a quadrillion redundant bitstrings all mapping to NaN sounds pretty bad, but logarithmic/information utilization-wise, this is actually not too bad.
reply