For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more ottaborra's commentsregister

Man, I don't get it. Was I the only one put off the video game portion of the story? It seemed so out of place and enough for me to go "rubbish" and toss the series aside


This sort of hubris has been the markings of the beginnings of the fall of companies. I am now fearful for Nvidia


Why someone like Henri Poincare or Polya was not included in this list is beyond me. Both were interested in figuring how mathematicians did what they did and both have brilliant insights


by taking steps to verify everything that was said


Read the Butlerian Crusade


So it's extended universe.


It's alluded to in the original, to explain why they're doing navigation the hard way.


Right. But none of what was said earlier is in there.


This makes me wonder. Is deep learning as a field an empirical science purely because everyone is afraid of the math? It has the richness of modern day physics but for some reason most the practioners seem to want to keep thinking of it as the wild west


No, there are many very mathematically inclined deep learning researchers. It's an empirical science because the mathematical tools we possess are not sufficient to describe the phenomena we observe and make predictions under one unified theory. Being an empirical science does not mean that the field is a "wild west". Deep learning models are subjectable to repeatable controlled experiments, from which you can improve your understanding of what will happen in most cases. Good practitioners know this.


>It's an empirical science because the mathematical tools we possess are not sufficient to describe the phenomena we observe and make predictions under one unified theory.

To me the deep learning is actually itself a [long-awaited] tool (which has well established, and simple at that, math underneath - gradient based optimization, vector space representation and compression) to make a good progress toward mathematical foundations of the empirical science of cognition.

In the 90-ies there were works showing that for example Gabors in the first layer of the biological visual cortex are optimal for the feature based image recognition that we have. And as it happens in the DL visual NNs the convolution kernels in the first layers also converge to the Gabor-like. I see [signs of] similar convergence in the other layers (and all those semantically meaningful vector operations in the embedding space in LLMs are also very telling). Proving optimality or similar is much harder there, yet to me those "repeatable controlled experiments" (i.e. stable convergence) provide strong indication that it will be the case (as something does drive that convergence, and when there is such a drive in dynamic systems, you naturally end asymptotically up ("attracted") near something either fixed or periodic), and that would be a (or even "the") math foundation for understanding of cognition (dis-convergence from the real biological cognition, ie. emergence of completely different, yet comparable, type of cognition would also be great, if not even the much greater result) .


The main point you're making is fair

The only gripe I have is > Being an empirical science does not mean that the field is a "wild west"

I think what you meant to say is: "Being an empirical science does not <b>necessarily</b> mean that the field is a \"wild west\""

you clearly haven't seen the social sciences

> Good practitioners know this

sure?

Edit: Removed unnecessary portions that wouldn't have continued the conversation in any meaningful way


I think the necessarily is clearly implied from context.


A little bit of A and B. You can do a lot with very little math beyond linear algebra, calculus, and undergraduate probability, and that knowledge is mainly there to provide intuition and formalize the problem that you’re solving a bit. You also churn out results (including very impressive ones) without doing any math.

A result of the above is that people are empirically demonstrating new problems and solving them very quickly — much more quickly than people can come up with theoretical results explaining why they work. The theory is harder to come by for a few reasons, but many of the successful examples of deep learning don’t fit nicely into older frameworks from, e.g., statistics and optimal control, to explain them well.


My predictions:

1. state-space models make transformer based models obsolete

2. Cuda killer gets set loose by AMD

3. The world's first successful head transplant takes place

4. Children of Dune gets greenlit

5. Lex Friedman retires from interviewing


There's no such thing as a "head transplant". The person receiving the transplant is the one whose head it is. The transplant they receive is the rest of the body. Therefore, "full body transplant" is a more accurate term.


Respectfully, I don't think you or anyone else alive right now actually knows how this would go down. The amount of interactions between organs, central nervous system, endocrine system, and brain means I'm not sure anybody involved would remain "them"


> state-space models make transformer based models obsolete

We will see whether they work on a large scale pretty soon. I hope they will, but they might not be. There're models which might outperform more advanced models on the smaller scale, and I haven't heard how Mamba performs on GPT scale.


Which Lex? Friedman or Fridman? They both do podcasts and interviews.


Why do you think Lex will retire? Did he indicate something?


State space?


See https://srush.github.io/annotated-s4.

I also think state-space models will make a comeback.


My predictions:

1. state-space models make transformer based models obsolete

2. Cuda killer gets set loose by AMD

3. The world's first successful head transplant takes place

4. Children of Dune gets greenlit

5. Lex Friedman retires from interviewing


did you mean kernel regression rather than kernel smoothing? I ask because: https://d2l.ai/chapter_attention-mechanisms-and-transformers...

Quoting from a previous section

> The attention mechanism allows us to aggregate data from many (key, value) pairs. So far our discussion was quite abstract, simply describing a way to pool data. We have not explained yet where those mysterious queries, keys, and values might arise from. Some intuition might help here: for instance, in a regression setting, the query might correspond to the location where the regression should be carried out. The keys are the locations where past data was observed and the values are the (regression) values themselves


Beautiful piece of math. Similar work: Traveling Words: A Geometric Interpretation of Transformers https://arxiv.org/abs/2309.07315


That was an interesting abstract, looking forward to read it!


Would this kind of understanding aid in extracting 'skills' from an LLM by identifying the relevant topology and isolating that part? Then we could have a toolbox of skills to assemble into what we needed?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You