bgavran's comments

bgavran · 2026-04-21T20:49:31 1776804571

There's been some exciting work generalising transformers to data structures that aren't just pure arrays: https://glaive-research.org/2025/02/11/Generalized-Transform...

I've implemented these in Idris 2: https://github.com/bgavran/TensorType/blob/main/src/NN/Archi...

bgavran · 2026-04-21T20:39:25 1776803965

Author here - thanks for engaging.

> One way to think of the bitter lesson as it applies to generative models is that ~all data carries some information about the structure of reality

Completely agree. It might have not come across, but what I'm pointing out in the post is that the data as it is currently encoded in the models is needlessly lossy. Tokens do not reveal all the information we have at our disposal. In natural language, that's fine, because it's quite loose in structure.

But if our domain is heavily structured (like modern programming languages are), why reveal only snippets of linearised syntax of that structure to the model? Why not reveal the full structure we have at our disposal?

> and architectures that let you train on more data are better because they learn better underlying world models.

By this argument, wouldn't we conclude that training on chess using the game structure wouldn't work either, since that'd be a model that uses less data?

Less data is the point, isn't it?

bgavran · 2026-04-21T18:57:01 1776797821

A coproduct in the category Set is a disjoint union of sets, i.e. A + B + C where A, B, C are sets.

We can think of this coproduct as involving two choices:

1) a choice of which component of the coproduct we're interested in (first, second, or third)

2) a choice of an element of that component

That is, `A + B + C` is isomorphic to `(i : Fin 3 * D i)` where `Fin 3` is a set with three elements, and `D : Fin 3 - > Type` and `D(0)=A`, `D(1)=B`, `D(2)=C`.

Then, the idea is: why index by a finite set? If you replace `Fin 3` by some arbitrary set, you start to be able to model a very general notion of a dependent type.

bgavran · 2026-04-21T10:37:22 1776767842

Author here - thanks for engaging.

On existing techniques - Type-Constrained Generation paper is discussed in the blog post (under Constrained Decoding), and I'd group typed holes in the same bucket.

The problem with those methods is that they're inference time: they don't update the weights. In this case, constrained decoding prevents the model from saying certain things, without changing what the model wants to say. This is especially problematic the more complex your type systems get, without even taking into account that type inference is for many of these undecidable.

Meaning, if I give you a starting string, in the presence of polymorphisms and lambdas you might not always be able to tell whether it completes to a term of a particular type.

On the syntactic difference: I'd gently reframe. The question isn't whether syntactically different programs are semantically equivalent, it's that regardless of which form you pick, the existing methods don't let the model learn the constructor choice.

That's what the next section is about.

woolion · 2026-04-21T12:39:21 1776775161

Thank you for your reply. FTR, I find the subject very interesting and I hope there will be more work on this line of approach.

>The problem with those methods is that they're inference time

I agree, I just thought it was missing some prior art (not affiliated with these papers :-P)

What is not clear to me at all is, is this the draft of a research idea? Or is there already some implementation coming in a later post?

It seems to me that such an idea would be workable on a given language with a given type system, but it seems to me there would be a black magic step to train a model that would work in a language-agnostic manner. Could you clarify?

bgavran · 2026-04-21T20:56:54 1776805014

There is an existing implementation validating this idea, and the plan is to make it publicly available at some point.

> It seems to me that such an idea would be workable on a given language with a given type system, but it seems to me there would be a black magic step to train a model that would work in a language-agnostic manner.

That's correct. The blog post alludes to infrastructure building as a necessary component of making that happen for that exact reason. I.e. while it's "easy" to generate a dependent pair in this way, generating an entire dependently typed AST is much more difficult. On the positive side, this is more of a software engineering effort rather than a research one.

woolion · 2026-04-22T07:07:28 1776841648

Ok, thank you for the info. Do you have any idea when at some point might be? I'd love to check it out.

bgavran · 2025-09-22T20:45:49 1758573949

This is an interesting writeup, I wonder if the authors considered a categorical approach to representation of general applicative arrays (which might be tree-shaped), as described here (https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/a...) or here (https://github.com/bgavran/TensorType)

godelski · 2025-09-24T00:32:41 1758673961

FYI attention wasn't originally purposed on AIAYN. Their main contribution was a fully transformer based network.

You could argue they didn't invent dot product attention nor transformers but they definitely formalized those so I'll leave that nitpicking to Schmidhuber lol. But the other stuff, they say just as much in the paper. It's easy to pass credit over to the ones who popularized a technique rather than the many people who developed it.

bgavran · 2025-07-21T14:24:22 1753107862

Indeed, the author seems to have a misunderstanding about both undecidability and complexity as they pertain to dependent types.

Dependent types do not add complexity to our system, they reveal it.

Case in point: here is a fully dependently-typed tensor processing framework written in Idris, which I believe matches most of the desiderata of his talk, capturing even a generalisation of arrays via Naperian functors that is mentioned at one point.

https://github.com/bgavran/TypeSafe_Tensors

bgavran · on Oct 26, 2024

"Anybody can share a sucess story of using a car to go places I couldn't have gone on foot"?

CT is a language and a tool, meaning anything you can say in the language of CT you can say in other languages.

Like cars, if you learn how to drive it (and this one has a very steep learning curve), it gets you places faster, but you there's in principle nothing stopping you from going there walking, i.e. without specifically referring to any CT concepts.

bgavran · on March 7, 2023

There's a well-articulated essay from Christopher Olah that captures the undercurrent, and perhaps even the motivation of many of these papers https://colah.github.io/posts/2015-09-NN-Types-FP/

You're completely right that the repo doesn't promote any introductory material. CT is notoriously difficult to get into, and this repository wasn't meant to be a pedagogical one, but rather a list of all the relevant papers.

Though, I'll see about remedying this. I have a different repository that curates a list of (what I consider to be, as a CS major) relatively approachable CT introductory materials https://github.com/bgavran/Category_Theory_Resources

and it might be a good idea to add a pointer to it.

hgsgm · on March 17, 2023

Olah's blog entry is about types and algebra. We've had types and algebra for a long time distinct Category Theory, which is a further abstraction over algebraic structures.

A lot of computer programmers who never studied Abstract Algebra or Type Theory before think that all of that is Category Theory because Haskell is their first exposure to all that, and Category Theory is a big meme in Haskell, thanks to Monad.

bgavran · on March 7, 2023

OP here.

The exponential growth in CS publication is much faster. This repository is simply a testament that CT is slowly ramping up.

It's meant to show what kind of expressive power and breadth current CT models have, which to my knowledge isn't something that's well-known outside of our niche community.

bgavran · on March 7, 2023

It's also meant to suggest where things are going (the kind of a chart I have in mind is this one https://twitter.com/bgavran3/status/1422206118688956420 ), though I understand this is something that deserves a much more substantial proof.

moralestapia · on March 8, 2023

>The exponential growth in CS publication is much faster.

So ... yes?

HN For You