> One way to think of the bitter lesson as it applies to generative models is that ~all data carries some information about the structure of reality
Completely agree. It might have not come across, but what I'm pointing out in the post is that the data as it is currently encoded in the models is needlessly lossy. Tokens do not reveal all the information we have at our disposal.
In natural language, that's fine, because it's quite loose in structure.
But if our domain is heavily structured (like modern programming languages are), why reveal only snippets of linearised syntax of that structure to the model? Why not reveal the full structure we have at our disposal?
> and architectures that let you train on more data are better because they learn better underlying world models.
By this argument, wouldn't we conclude that training on chess using the game structure wouldn't work either, since that'd be a model that uses less data?
A coproduct in the category Set is a disjoint union of sets, i.e. A + B + C where A, B, C are sets.
We can think of this coproduct as involving two choices:
1) a choice of which component of the coproduct we're interested in (first, second, or third)
2) a choice of an element of that component
That is, `A + B + C` is isomorphic to `(i : Fin 3 * D i)` where `Fin 3` is a set with three elements, and `D : Fin 3 - > Type` and
`D(0)=A`, `D(1)=B`, `D(2)=C`.
Then, the idea is: why index by a finite set? If you replace `Fin 3` by some arbitrary set, you start to be able to model a very general notion of a dependent type.
On existing techniques - Type-Constrained Generation paper is discussed in the blog post (under Constrained Decoding), and I'd group typed holes in the same bucket.
The problem with those methods is that they're inference time: they don't update the weights. In this case, constrained decoding prevents the model from saying certain things, without changing what the model wants to say. This is especially problematic the more complex your type systems get, without even taking into account that type inference is for many of these undecidable.
Meaning, if I give you a starting string, in the presence of polymorphisms and lambdas you might not always be able to tell whether it completes to a term of a particular type.
On the syntactic difference: I'd gently reframe. The question isn't whether syntactically different programs are semantically equivalent, it's that regardless of which form you pick, the existing methods don't let the model learn the constructor choice.
Thank you for your reply. FTR, I find the subject very interesting and I hope there will be more work on this line of approach.
>The problem with those methods is that they're inference time
I agree, I just thought it was missing some prior art (not affiliated with these papers :-P)
What is not clear to me at all is, is this the draft of a research idea?
Or is there already some implementation coming in a later post?
It seems to me that such an idea would be workable on a given language with a given type system, but it seems to me there would be a black magic step to train a model that would work in a language-agnostic manner. Could you clarify?
There is an existing implementation validating this idea, and the plan is to make it publicly available at some point.
> It seems to me that such an idea would be workable on a given language with a given type system, but it seems to me there would be a black magic step to train a model that would work in a language-agnostic manner.
That's correct. The blog post alludes to infrastructure building as a necessary component of making that happen for that exact reason. I.e. while it's "easy" to generate a dependent pair in this way, generating an entire dependently typed AST is much more difficult. On the positive side, this is more of a software engineering effort rather than a research one.
FYI attention wasn't originally purposed on AIAYN. Their main contribution was a fully transformer based network.
You could argue they didn't invent dot product attention nor transformers but they definitely formalized those so I'll leave that nitpicking to Schmidhuber lol. But the other stuff, they say just as much in the paper. It's easy to pass credit over to the ones who popularized a technique rather than the many people who developed it.
Indeed, the author seems to have a misunderstanding about both undecidability and complexity as they pertain to dependent types.
Dependent types do not add complexity to our system, they reveal it.
Case in point: here is a fully dependently-typed tensor processing framework written in Idris, which I believe matches most of the desiderata of his talk, capturing even a generalisation of arrays via Naperian functors that is mentioned at one point.
"Anybody can share a sucess story of using a car to go places I couldn't have gone on foot"?
CT is a language and a tool, meaning anything you can say in the language of CT you can say in other languages.
Like cars, if you learn how to drive it (and this one has a very steep learning curve), it gets you places faster, but you there's in principle nothing stopping you from going there walking, i.e. without specifically referring to any CT concepts.
You're completely right that the repo doesn't promote any introductory material. CT is notoriously difficult to get into, and this repository wasn't meant to be a pedagogical one, but rather a list of all the relevant papers.
Though, I'll see about remedying this. I have a different repository that curates a list of (what I consider to be, as a CS major) relatively approachable CT introductory materials https://github.com/bgavran/Category_Theory_Resources
and it might be a good idea to add a pointer to it.
Olah's blog entry is about types and algebra.
We've had types and algebra for a long time distinct Category Theory, which is a further abstraction over algebraic structures.
A lot of computer programmers who never studied Abstract Algebra or Type Theory before think that all of that is Category Theory because Haskell is their first exposure to all that, and Category Theory is a big meme in Haskell, thanks to Monad.
The exponential growth in CS publication is much faster.
This repository is simply a testament that CT is slowly ramping up.
It's meant to show what kind of expressive power and breadth current CT models have, which to my knowledge isn't something that's well-known outside of our niche community.
It's also meant to suggest where things are going (the kind of a chart I have in mind is this one https://twitter.com/bgavran3/status/1422206118688956420 ), though I understand this is something that deserves a much more substantial proof.
I've implemented these in Idris 2: https://github.com/bgavran/TensorType/blob/main/src/NN/Archi...
reply