more ottaborra's comments

ottaborra · on March 18, 2024

Man, I don't get it. Was I the only one put off the video game portion of the story? It seemed so out of place and enough for me to go "rubbish" and toss the series aside

ottaborra · on March 11, 2024

This sort of hubris has been the markings of the beginnings of the fall of companies. I am now fearful for Nvidia

ottaborra · on March 8, 2024

Why someone like Henri Poincare or Polya was not included in this list is beyond me. Both were interested in figuring how mathematicians did what they did and both have brilliant insights

ottaborra · on Feb 25, 2024

by taking steps to verify everything that was said

ottaborra · on Jan 14, 2024

Read the Butlerian Crusade

Tao3300 · on Jan 14, 2024

So it's extended universe.

Animats · on Jan 14, 2024

It's alluded to in the original, to explain why they're doing navigation the hard way.

Tao3300 · on Jan 14, 2024

Right. But none of what was said earlier is in there.

ottaborra · on Jan 1, 2024

This makes me wonder. Is deep learning as a field an empirical science purely because everyone is afraid of the math? It has the richness of modern day physics but for some reason most the practioners seem to want to keep thinking of it as the wild west

HighFreqAsuka · on Jan 1, 2024

No, there are many very mathematically inclined deep learning researchers. It's an empirical science because the mathematical tools we possess are not sufficient to describe the phenomena we observe and make predictions under one unified theory. Being an empirical science does not mean that the field is a "wild west". Deep learning models are subjectable to repeatable controlled experiments, from which you can improve your understanding of what will happen in most cases. Good practitioners know this.

trhway · on Jan 1, 2024

>It's an empirical science because the mathematical tools we possess are not sufficient to describe the phenomena we observe and make predictions under one unified theory.

To me the deep learning is actually itself a [long-awaited] tool (which has well established, and simple at that, math underneath - gradient based optimization, vector space representation and compression) to make a good progress toward mathematical foundations of the empirical science of cognition.

In the 90-ies there were works showing that for example Gabors in the first layer of the biological visual cortex are optimal for the feature based image recognition that we have. And as it happens in the DL visual NNs the convolution kernels in the first layers also converge to the Gabor-like. I see [signs of] similar convergence in the other layers (and all those semantically meaningful vector operations in the embedding space in LLMs are also very telling). Proving optimality or similar is much harder there, yet to me those "repeatable controlled experiments" (i.e. stable convergence) provide strong indication that it will be the case (as something does drive that convergence, and when there is such a drive in dynamic systems, you naturally end asymptotically up ("attracted") near something either fixed or periodic), and that would be a (or even "the") math foundation for understanding of cognition (dis-convergence from the real biological cognition, ie. emergence of completely different, yet comparable, type of cognition would also be great, if not even the much greater result) .

ottaborra · on Jan 1, 2024

The main point you're making is fair

The only gripe I have is > Being an empirical science does not mean that the field is a "wild west"

I think what you meant to say is: "Being an empirical science does not <b>necessarily</b> mean that the field is a \"wild west\""

you clearly haven't seen the social sciences

> Good practitioners know this

sure?

Edit: Removed unnecessary portions that wouldn't have continued the conversation in any meaningful way

bawolff · on Jan 2, 2024

I think the necessarily is clearly implied from context.

tnecniv · on Jan 1, 2024

A little bit of A and B. You can do a lot with very little math beyond linear algebra, calculus, and undergraduate probability, and that knowledge is mainly there to provide intuition and formalize the problem that you’re solving a bit. You also churn out results (including very impressive ones) without doing any math.

A result of the above is that people are empirically demonstrating new problems and solving them very quickly — much more quickly than people can come up with theoretical results explaining why they work. The theory is harder to come by for a few reasons, but many of the successful examples of deep learning don’t fit nicely into older frameworks from, e.g., statistics and optimal control, to explain them well.

ottaborra · on Jan 1, 2024

My predictions:

1. state-space models make transformer based models obsolete

2. Cuda killer gets set loose by AMD

3. The world's first successful head transplant takes place

4. Children of Dune gets greenlit

5. Lex Friedman retires from interviewing

p-e-w · on Jan 1, 2024

There's no such thing as a "head transplant". The person receiving the transplant is the one whose head it is. The transplant they receive is the rest of the body. Therefore, "full body transplant" is a more accurate term.

RugnirViking · on Jan 1, 2024

Respectfully, I don't think you or anyone else alive right now actually knows how this would go down. The amount of interactions between organs, central nervous system, endocrine system, and brain means I'm not sure anybody involved would remain "them"

solomatov · on Jan 2, 2024

> state-space models make transformer based models obsolete

We will see whether they work on a large scale pretty soon. I hope they will, but they might not be. There're models which might outperform more advanced models on the smaller scale, and I haven't heard how Mamba performs on GPT scale.

runjake · on Jan 2, 2024

Which Lex? Friedman or Fridman? They both do podcasts and interviews.

nebula8804 · on Jan 1, 2024

Why do you think Lex will retire? Did he indicate something?

bilsbie · on Jan 1, 2024

State space?

nextos · on Jan 2, 2024

See https://srush.github.io/annotated-s4.

I also think state-space models will make a comeback.

ottaborra · on Dec 30, 2023

My predictions:

1. state-space models make transformer based models obsolete

2. Cuda killer gets set loose by AMD

3. The world's first successful head transplant takes place

4. Children of Dune gets greenlit

5. Lex Friedman retires from interviewing

ottaborra · on Dec 25, 2023

did you mean kernel regression rather than kernel smoothing? I ask because: https://d2l.ai/chapter_attention-mechanisms-and-transformers...

Quoting from a previous section

> The attention mechanism allows us to aggregate data from many (key, value) pairs. So far our discussion was quite abstract, simply describing a way to pool data. We have not explained yet where those mysterious queries, keys, and values might arise from. Some intuition might help here: for instance, in a regression setting, the query might correspond to the location where the regression should be carried out. The keys are the locations where past data was observed and the values are the (regression) values themselves

ottaborra · on Dec 19, 2023

Beautiful piece of math. Similar work: Traveling Words: A Geometric Interpretation of Transformers https://arxiv.org/abs/2309.07315

3abiton · on Dec 20, 2023

That was an interesting abstract, looking forward to read it!

tudorw · on Dec 20, 2023

Would this kind of understanding aid in extracting 'skills' from an LLM by identifying the relevant topology and isolating that part? Then we could have a toolbox of skills to assemble into what we needed?

HN For You