Report on Structure to Deep Learning

Notes taken by Patrick Reany

April 2024

From the video:

WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...] Machine Learning Street Talk (1 April 2024).

Transcript of the video.

You can obtain a copy of the main paper discussed in this video by Paul Lessard and others at

Categorical Deep Learning: An Algebraic Theory of Architectures.

A PRACTICAL TYPE THEORY FOR SYMMETRIC MONOIDAL CATEGORIES. A paper by Michael Shulman.

I would subtitle this video as "A deeper use of category theory in deep learning."

Below, I have provided some excerpts from the discussion that emphasize category theory in relation to ML.

Definitions:

ML = machine learning.

GDL = Geometric Deep Learning = category theory + neural network.

Grok = to understand by intuition.

Canalization = a measure of the ability of a population to produce the same phenotype regardless of variability of its environment or genotype [Wikipedia].

RNN = recurrent neural network.

CT = Computer Tomography.

Scheme = "code is data."

Abduction (logical sense) = "the simplest and most likely explanation" [Wikipedia]. It is frequently a leap to a conclusion that is as often wrong as right. It lacks preparation in the subtle aspects of the problem at hand.

Symbolica = A startup company by George Morgan. He believes that the best way forward in AIs to completely refound it by scrapping their statistical approach to intelligence to a symbolic approach. My guess is that Mr. Morgan is running on intuition in this notion, which is not a bad thing. (Morgan himself refers to it as a "moonshot company.")

Introduction:

The purpose of this report is to pick out of the video some interesting comments and statements that pertain to the future development of AI, particularly deep learning. It is the opinion of Dr. Paul Lessard that the next big breakthrough in neural net intelligence will need a more sophisticated mathematical framework to work within, that being category theory.

The principal people in this video are Dr. Paul Lessard (principal scientist for Symbolica), Dr. Keith Duggar, and Dr. Tim Scarfe.

Warning 1: Occasional references to the programming language Haskell.

Warning 2: I have included timestamps, but they are only approximate.

Warning 3: Mostly, I have used the transcript of the video discussion as my source for the quoting, but it's not always accurate.

So we begin:

At around 6:40, Paul is asked to describe the paper he coauthored:

...If you come from it from where I come from it, it is, oh, I'm interested in neural networks as a particular instance of morphisms in a 2 category of objects, parametric maps, and reparameterizations, and I'm interested in the kind of structure that they're interested in, but thinking about it as, you know, algebras for monads or co algebras for co monads, stuff like this. Right? So this sort of, like, not I'm not gonna say, like, higher level of abstraction, but much more sort of 2 categorical flavor. Whereas, they're really thinking in terms of, sort of, representation theory and a sort of generalized version of representation theory.
Bruno and I, you know, Bruno first. Right? I learned what I know about neural networks through reading Bruno's thesis, and that collaboration has been really instructive. But I know we've really been thinking about this as, okay, there's a 2 categorical story here. We wanna do 2 categorical universal algebra.

Keith Duggar said [19:05]:

So I so I love that. Let's let's pause here for one second. So I love this idea. So category theory can help one design domain-specific languages.

Paul Lessard replied:

Yes. No. This has been, like, become a huge deal, like, as you know, I only know of it from, like, the homotopy type theory community and the kinds of projects that that has spawned. But as an instance of this, you know, so there's this paper, I think, by Mike Schulman called, pract a called a practical type theory for symmetric monoidal categories. And what he does in this paper is say, okay, there are a bunch of type systems that are interpreted into symmetric monoidal categories, but no one uses them for proving things about symmetric monoidal categories.
And he asked the question, why? And he essentially, you know, nails down a couple of points that all of the existing type theories that are interpreted into symmetric monoidal categories fail to satisfy. And, like, the one that I found the most compelling is, like, they don't have this sets with elements flavor to them. Right? And what's prerequisite for sets with elements?
And that feeling is, well, you need tuples of terms to be in tuples of types. And that needs to be symmetric on both sides of judgment. Right. And so he says, okay. I'm gonna design a new type system that has that property with my intended semantics in, symmetric monodal categories.

Tim Scarfe [26:30]:

So so, I mean, that that point is really interesting that some people have said to me that they think the future of software engineering is like neuroscience. And what I mean by that is we don't know how the brain works. We stick probes in and we try and kind of like, you know, figure out what's going on. A bit like the blind men and the elephant by having all of these different views on this inscrutable mountain. And, it's it's just a little bit weird, isn't it?
Right? So, you know, we're building these multi agent systems and they have all these complex dynamics and we try to apply engineering techniques by doing monitoring and alerting and having these, like, weird thresholds and just fixing problems as as as they show up. And a lot of people just say, well, this is the way things are. This is the way software is gonna go. And as I understand software engineering, because Keith and I have have written a whole bunch of complex software recently, and part of the reason for having abstraction is to create a cognitive interface.

Paul Lessard (a major goal) [30:00]:

...But so if you could make something better such that if you, say, have an LLM that you can interact with in natural language but that actually emulates a type system, right, you will end up with an LLM capable of doing higher order processing, higher order organization....

Talia Ringer's website:

https://cs.illinois.edu/about/people/department-faculty/tringer

Her paper (which she coauthored), "Can Transformers Learn to Solve Problems Recursively?" is found at

https://arxiv.org/abs/2305.14699

[36:30]

...they can do coalgebras, however. Right? And coalgebras are things where, yeah, you can just like keep popping something off, you know, ad infinitum. You can just like query it. Hey, give me this part of this. Give me this part of this. And there's no stop to how many times I could ask for that.
Right? So I can do co algebraic stuff too. But my sense is that maybe this is some sort of more definitional confusion than it is really a statement of the kinds of things that neural architectures can do.

Tim Scarfe [52:30]:

Yeah. I mean, another thing was when I when I first read your paper, I assumed that there was some kind of I guess I visualized it as a kind of compiler. So now it's a it's a bit like imagine we had something like JAX and we did to JAX what TypeScript did to JavaScript. So, you know, we add typing to it, we add compositional semantics and so on. And and what we've done is is we've now created this whole, like, interface, basically, to understand how to build the next generation of of neural networks.
And then the system is a bit like a compiler. So it will compile down to neural network modules that have, you know, GDL constraints or algorithmic constraints and so on. And in the future, maybe even symbolic, networks that that and it would just it would just automagically do all of this stuff because what we wanna get to is we need to get away from the alchemy. Right? We need we need this compiler to kind of do the hard work for us so that, you know, just all of these problems that we're seeing at the moment just are a thing of the past.

Paul Lessard [54:00]:

And here, I do mean reason, right, in the sense of coming up with reasons for things. Right? Why does this do this? How does this work? Those kinds of things. Right? So you abstract away this sort of, like, essentially analytic detail to this to a problem, and then suddenly it becomes tractable. It becomes conceivable. I would say that what really category theory does is it abstracts the idea of a semantics. Right?

Haskell reference [55:30].

Keith Duggar [57:30]:

And it's fun because, you you know, it's really enjoyable when you come across a topic that that blows your mind. And and the way, like, I I kind of quantify a mind blowing thing is is if I understand it the day that I learn it, and then somehow the next day, totally no longer understand it. It's kinda like it was kinda mind blowing.

Paul Lessard [59:30]:

You know, type theory is the core of, like, you know, a syntactic treatment of a synthetic mathematics. Right? At least that's, you know, the the weird, you know, not fuzzy, but, like, rarefied and unbelievably abstract world that, you know, through which I discovered type systems.

Paul Lessard [1:04:00] (The preparation for problem solving.):

I mean, the thing that is sort of that about category theory is actually, you know, Grothendieck had this statement that was like, if it's not obvious, you're not ready to work on the problem yet. So and, you know, he had this whole, you know, principle of, like, theory building for problem solving. And one of the aspects of theory building for problem solving is you develop a vast library of equivalent or at least related representations of similar or the same concept. And you solve a problem by finding the one in which it's obvious. Right?
It's sort of it's a shamanic mathematics. It's a mathematics about understanding something and then not really having to do that much. Right? The point is, think about this in the right way, and it becomes trivial. This is or not trivial or at least, like, clear.

At 1:06:30, Paul claims that the copyright infringement suits against the LLM companies will all go through. But which one of us has not gained knowledge by being trained on copyrighted material? Are we then faulting the LLMs just because they're more efficient at doing so? Even after I listened carefully to Paul's argument, I'm not sure what to make of his claim that "data is code." Isn't raw data only "code" after being "compiled" into it? Surely a pile of bricks isn't the same thing as a brick house, unless you're a gecko, right?

Tim Scarfe [1:14:00]:

Mhmm. Cool. Now, so so you've written this paper. Right? If you were to give me a one minute elevator pitch of the paper, what would it be?

Paul Lessard:

Title of the paper is Categorical Deep Learning and Algebraic Theory of Architectures. One thing that like the, you know, hardcore category theorist amongst us might take objection to is that the words algebraic theory actually have a formal meaning in category theory, and that's not actually exactly what we mean. Right? An algebraic theory to a category theorist is like a Lavier theory [Lawvere theory (?)]. It's a it's one of these syntactic categories of, like, the allowed manipulations and stuff like that.
Whereas, what we mean is specifically architecture as being the semantics of this, sort of, these universal algebraic notions. Mhmm. So that's the title of the paper. I've gotten distracted by the statement that we said algebraic theories. Maybe that's not the best name, but it's a pretty good name.
Why is it a good name? Because the point is that we wanna say that all of the things that have been studied in GDL and significantly more examples, all of them are in fact instances of algebraic structures, be they structure preserving maps, I. E. Morphisms between algebras for monads or structure maps of algebras for monads or the appropriate dual constructions. Say, these are the sort of these, like, universal structures and then we're interpreting them into a 2 category whose objects are vector spaces, whose morphisms are parametric maps.

The remaing part of the video is a discussion of the category theory preliminaries one needs to be able to understand Paul's paper.