Transcript of EP 316 – Ken Stanley on the AI Representation Problem

The following is a rough transcript which has not been revised by The Jim Rutt Show or Ken Stanley. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is Ken Stanley. Ken is leading the open-endedness work at Lila Sciences on the quest for scientific superintelligence. He previously led the open-endedness team at OpenAI. Prior to that, he was professor of computer science at the University of Central Florida. He was also cofounder of Geometric Intelligence, which was acquired by Uber to create Uber AI Labs. Welcome, Ken.

Ken: Thank you, Jim. Really glad to be here.

Jim: Yeah, it’s great to have you back. This is Ken’s third appearance. Back in EP 137, we did a pretty deep dive into neuroevolution, one of my favorite topics. And back in EP 130, we talked about Ken’s very interesting book called “Why Greatness Can’t Be Planned,” where he lays out the idea of open-endedness. I’ve been encouraging some of my friends in the AI space to think about open-endedness, to read that book, and some of them are starting to do it. It’s really actually a very important project. So if y’all think you know everything about AI, you don’t if you haven’t read that book. So read Ken’s “Why Greatness Can’t Be Planned.” Today, we’re gonna do something a little different. We’re gonna do a deep dive into a paper that Ken wrote along with some other folks. The paper is called “Questioning Representational Optimism in Deep Learning: The Fractured Entanglement Representation Hypothesis.” Ken was one of four authors. The other three were Arkarsh Kumar, Jeff Clune, and Joel Lehman. So let’s get started here. What the hell problem were you addressing with this paper?

Ken: That’s a good question. It’s not really about a particular problem. It’s more about an observation that was really interesting—an observation about the underlying representation inside of neural networks, which are, of course, the underlying technology behind large language models and other large models used today. If you wanna kind of situate it next to a problem, it’s really the problem of representation. What is the difference between good representation and bad representation in the world? And there’s a nuance to that, which this paper gets into, which is this idea that you can actually be good at answering questions or good at solving a task, giving the right answer, but actually have bad representation. The paper explores in-depth what the possible nature of bad representation when you seem on the surface to be doing well is. Given that nature, it explores what the implications might be. The underlying issue that’s really deep and fundamental is it may be the case that the large language models that everybody is now going crazy over suffer from this problem, which we called fractured and tangled representation, or FER for short. And so if they do, despite that they are doing amazing things, we’ll eventually confront the underlying hidden problem of this fractured and tangled representation.

Jim: Now why do we care that under the surface these deep learning nets are horrific balls of yarn all tangled up? Why does that matter if they do the job?

Ken: We don’t know for sure the extent of the tangled mess. The paper shows hints that there could be this problem. There’s good reason to believe it may be the case. I guess if I had to make a bet, I’d say it probably is the case. But we still need more evidence to be sure. So this is really like a call to action to really look into this very carefully.

But if they do have this problem, which seems likely, why should we care? It’s a great question. If they get the answers right, if they help you with whatever you need help with, what do I care what’s going on underneath the hood? It can be a mess as far as I’m concerned. But there are several things that can be affected by having a mess underneath the hood. These things, which may not be immediately obvious, are eventually gonna bite you.

One of those things is creativity. It’s an interesting thing to think about creativity because if you think about it, that’s one of the areas, maybe one of the few areas where there’s still a lot of uncertainty about whether we’re actually making a lot of progress. Scaling is doing some really impressive stuff, but is it actually getting us commensurate levels of creativity as we go up in scale? While some people may feel that the networks are creative, other people would say they’re not actually very creative. They do derivative creativity. In other words, if you ask for a new bedtime story, you’ll get a bedtime story that’s no one’s ever told before—so in that sense, creative. But it’s not really anything new. It’s not like a new genre of literature. It’s not gonna win a literary prize. They don’t tend to do this kind of transformational creativity that humans at our best do.

This ability to be extremely creative is clearly related to your underlying representation of the world. How you represent the world ultimately determines the kinds of things that you can imagine. Your imagination is a function of your underlying representation. So if your underlying representation is a mess, your imagination will be bottlenecked in terms of what you can imagine. When people say things like, “Oh, it’s winning all these prizes like Math Olympiad,” well, look, so where is all the new math? Maybe it’s not like a human underneath the hood representationally. There’s a couple other areas where this can affect. I don’t mean to give a long laundry list. Maybe just give one other example is in continual learning, for example.

Jim: No. That’s important. This is important.

Jim: No, that’s important. This is important.

Ken: Okay, okay, yeah. So continual learning is the next one, and continual learning means the ability to keep learning. What we would imagine happening if you wanted to keep adding more knowledge—and this is, by the way, what people do. We do continually learn throughout our lifetime. It’s not like we’re baked and then you just put us out into the world and we don’t learn anymore, which is kind of the life of an artificial neural network or a large language model. But we keep learning throughout our lifetime.

To facilitate that, it’s easier to learn if you represent the world in a parsimonious way than if your representation of the world is a mess. Let’s give a concrete example of what it means to represent the world in a messy way. Imagine that I know a lot about faces—like, can recognize faces, I’ve seen a lot of faces, I could recognize my friends, I could even describe what a face is—but I don’t realize that they’re symmetric. So I seem functionally fine, like I know a face when I see it, I know my friends, I can recognize them. But I actually don’t realize that faces are symmetric.

This is a representational issue. What could that do to me? Well, is it going to hurt my ability to draw any new faces? How can I imagine a face if I don’t actually respect this principle of symmetry? It’s also going to hurt my ability to learn new kinds of faces, like if I’m introduced to a new class of faces, maybe a new kind of creature or something like this. Because the representation of that new class would draw from what I understand about faces. And if I don’t understand they’re symmetric, I’m going to be reinventing the wheel over and over again.

What that means is I can learn the new stuff, but it will be more expensive because I don’t have the right basis for understanding the underlying symmetries and regularities in the world. What you would predict if you have this bad kind of representation is that it will become increasingly expensive the more you learn because everything is just in this jumbled, fractured, and tangled mess.

We may be seeing that in the fact that it is so unbelievably astronomically expensive to get to where we are. We have people seriously proposing things like spending half a trillion dollars or trillions of dollars to train next-generation models. One school of thought might be, well, that’s what it takes. Another school of thought though might be that maybe there’s something wrong here that it costs this much when you consider the energy and compute costs of a human brain and its ability to get to this level.

There’s always a lot of controversy in an argument like this, but what the paper offers that’s new is that we actually have evidence of this kind of problem. So it’s not just talking about “maybe there’s this issue with representation”—kind of wishy-washy thing. It’s that here are pictures, you can look at them, they’re concrete, you can see SGD or stochastic gradient descent, which is the algorithm that’s used to train these models, producing this terrible mess. And so you can understand that it actually exists in the real world. This is a real problem. And so now it’s not just speculation.

The third thing is generalization. That can also be affected by having this FER. So there’s a lot of things that can be affected even though you keep getting answers right and right and right. You can actually look like you’re perfect across the inside of the distribution that you’ve been trained on and still have horrible FER under the hood, which is a really interesting situation, which means benchmarks will not actually reveal it. And we’re obsessed with benchmarks. So that’s what the paper is so intriguing about—this idea that we might not be able to see the problem with the whole paradigm of testing that we’re using now to evaluate these models.

Jim: Very interesting. And while I was reading this litany of issues with the hairball, the thing that brought to mind immediately was the issue of modularity. My home area is in evolutionary computing, and I’ve had many conversations with people, including several with my good friend, John Holland, rest in peace, on the difficulty of achieving modularity. But the power of modularity is particularly multi-level modularity once you achieve it. And of course, one of the reasons life is so spectacularly generative is it’s modular multiple levels down. And you can sometimes just hack a tiny bit of the genetic code and move quite a large distance in phenotype space. If you have just a hairball of 1967 Fortran code and try to do genetic programming on it, you’re just going to get horrific little moves at best or total noise most of the time. So tell me how you think lack of modularity relates to the FER result?

Ken: Yeah. Modularity is definitely relevant here. And so you can see, like, in the vocabulary here, which, for listeners outside of this area, maybe even inside the area, may sound a bit jargony. I just want to make clear what we mean when we say fractured and tangled representation. The two words “fractured” and “entangled,” those are actually describing some specific things and in one case relating to modularity.

So fractured means that there are things that we should be representing with the same information, but they become fractured into separate parts of the network. That’s like the two sides of a face. The left side of your face and the right side of your face are clearly not represented by completely disjoint sets of DNA or genes. That would be crazy. There is, to a large extent, shared information in the left and right side of your face and left and right side of your body for that matter. Obviously, there are exceptions like the heart is on one side and so forth. But just because the heart is on one side doesn’t mean that your left arm is from an entirely different set of DNA than your right arm. So information is reused. Fracture means that we fail to reuse when we should.

Now entanglement, that’s really related to modularity. In entanglement, it means that things that should be separated somehow got mixed up with each other. So it’s like if I somehow extended the size of your arm, then your nose would grow. This is not a correlation that we necessarily want. There’s no particular important correlation between your nose and your arm. If that was the way it worked, it would be weird.

In the space of neural networks and representations in neural networks, it would mean that you have some strange underlying correlations in your belief system that we aren’t really able to see, but that will come out in your behavior. You believe things that are actually related to each other that should not be related at all to each other, and that’s an entangled representation.

The thing about entanglement is kind of the opposite of modularity. In modularity, you separate things out that are actually separate, and you have a plug-and-play kind of situation which is very convenient for building things up. That’s what’s so convenient about modularity.

When we talked about in the paper the opposite of fractured and tangled representation, we called that a unified factored representation. Unified is the opposite of fractured, but factored is what we used as the word to be the opposite of entangled. Factor is basically saying modular. It’s factored into its underlying factors and they’re separated from each other appropriately.

So it’s like in a face, the eyes are separated from the nose or separated from the mouth the way that I represent it inside under the hood in my brain. I understand this. I’m not confused that maybe the eye is actually connected to the mouth in some way. That’s not something that I believe because it’s modular. I understand that those are separately represented components.

We see that literally visually in the paper, which is what’s so interesting about these visualizations—it’s not just a theoretical concept, but you can actually look and see that there’s entanglement in there viscerally without any kind of technical knowledge of the field. The question is, how concerned should we be? And, of course, it seems to me we should be pretty concerned because modularity has innumerable benefits across all kinds of disciplines like human engineering, but also to evolution to your point there. We see it all over the place. Modularity is a very important principle. And so the loss of it to this kind of entanglement probably has costs.

Jim: Yeah. Especially in terms of the developmental dynamics. Because I was just thinking about this as you were talking, this fractured element. You have things that are not relevant that are brought into the picture. So then you have to have other things that are canceling out the nonrelevant, something like that. So the—yeah. There’s a compounding of things that have to be nudged to get what you want. While if you had—

Ken: Exactly.

Jim: Yeah. A more modular composable system, you could make moderate changes in here and not this horrible set of offsetting errors and counter errors and fixes for errors, etcetera. Makes a lot of sense.

Jim: Yeah. Especially in terms of the developmental dynamics. Because I was just thinking about this as you were talking, this fractured element. You have things that are not relevant that are brought into the picture. So then you have to have other things that are canceling out the nonrelevant, something like that. So the—yeah, there’s a compounding of things that have to be nudged to get what you want. While if you had—

Ken: Exactly.

Ken: Yeah. And it’s a really good way to describe it because that helps, I think, intuitively to see why this could get more and more expensive. It’s like the more you have this, the more you have to deal with the fact that you’ve done that. So the next layer of stuff you add on top of that is even worse and more expensive because how are you going to make sense of the world when you have all of this entanglement? If you’re now trying to build more knowledge on top of a really bad foundation, it just gets worse and worse and more and more expensive to rectify.

It’s interesting to me because I see some people who have responded to our paper, which is understandably polarizing in some ways. Some people are upset to hear this claim and other people think that’s exactly the problem that I’ve been trying to articulate. But among the detractors, sometimes I hear, “The FER looks terrible, but maybe that’s just—maybe it’s actually good and we just don’t understand. Like, maybe there’s something good about this fractured and tangled stuff.” That’s like the way neural networks are. They work really well. So maybe you, the authors, don’t really understand how representation works. We can’t, like, eyeballing it, it looks bad, but you just don’t get it. Like, there’s something really great that we just can’t grok that’s happening. But I think that’s just completely implausible because of this modularity argument. I mean, unless you actually think that the world works better if you’re entangled, how can you adopt that view? Like, this is a horrible organizational principle. If you think there’s some secret that makes that work better, I mean, I think you’re on pretty shaky ground and really have to think carefully what your argument is why that would actually be good.

Jim: Yeah. As I thought about it intuitively, I agree with you as it turns out. Right now, let’s go on to why this is more than just Ken and his buddies speculating around the lunch table, which is fun, but isn’t science. Right? It’s the start of science. It’s the open-ended search part, hypothesis generation. One of the great skills in science, actually, one of the things I’ve discovered in my years hanging out with great scientists is that the really greatest ones sometimes are just great at hypothesis generation. But so anyway, you have a hypothesis. Now tell us about the work you did to validate the hypothesis.

Ken: Right. So it’s an interesting story. It’s a slightly long story. So maybe you’ll indulge this because it goes back a ways to explain how we came to this.

Jim: Absolutely. Tell the whole thing.

Jim: Yeah. As I thought about it intuitively, I agree with you as it turns out. Right now, let’s go on to why this is more than just Ken and his buddies speculating around the lunch table, which is fun but isn’t science. Right? It’s the start of science. It’s the open-ended search part, hypothesis generation. One of the great skills in science—actually, one of the things I’ve discovered in my years hanging out with great scientists—is that the really greatest ones sometimes are just great at hypothesis generation. But so anyway, you have a hypothesis. Now tell us about the work you did to validate the hypothesis.

Ken: Right. So it’s an interesting story. It’s a slightly long story. So maybe you’ll indulge this because it goes back a ways to explain how we came to this.

Jim: Absolutely. Tell the whole thing.

Ken: Okay. So, one thing that’s interesting about this story is it isn’t the case that we first came up with this hypothesis and then sought out a validation of it. It’s actually the other way around. We saw evidence of it that was not expected and then tried to make sense of that evidence after the fact. So it was kind of the serendipitous and surprising encounter with this evidence that really triggered this paper ultimately.

And so what I’m gonna tell you is where did that evidence come from? Like, how did we stumble upon this issue? And it really goes back to this Picbreeder experiment. This may seem like it’s a bit of a digression, but it’s not really—it ties back in.

So, many years ago, like fifteen years ago or so, we were thinking about how we can make systems that would be open-ended. That’s the field that I’m interested in—open-endedness in artificial intelligence. Open-endedness means systems that continually discover and create novel and interesting artifacts, like, indefinitely without bound. It’s sort of inspired by stuff like biological evolution, which is like the tree of life begins with presumably a single cell. And then over a billion years or more, you have the whole tree of life created in a single run. This is an open-ended system. It’s divergent. It doesn’t just converge to solution to a problem. It just keeps on popping out stuff, new species in this case. It invented photosynthesis. It invented the flight of birds. It invented human intelligence.

That’s an open-ended system. Civilization is the other big open-ended system, like human civilization. If you go back, you have things like the wheel or fire—they were invented thousands of years ago. And in this case, you wait thousands of years and you get a different kind of tree, the tree of every idea that’s ever been had and all the inventions we’ve ever made. And, of course, it’s very impressive because now we have computers and we have space stations and we have AI. So, like, a lot of innovation has happened and it’s divergent just like evolution. It doesn’t end. It just keeps on popping out. Like, you’ll come back in a hundred years, it’ll be more interesting. It won’t just converge to a final solution, whatever that would be.

These are open-ended systems and I’ve always been interested in how do you build them in an autonomous manner? How do you build an AI in effect that can do that? And I’ve always thought that this is fundamental to human intelligence because to me, it’s really our greatest trick is our ability to produce endless innovation. Like, you may get a good test score on your standardized test. You maybe got a good SAT score. No one will remember that about you. No one cares. It’s not in the history books. The only things we remember is the legacy of our ideas.

So that’s the thing that I wanna try to understand from an algorithmic perspective. We face this challenge though. How do you do that? We don’t know how to do that. Obviously, this is not something we’ve achieved yet. It doesn’t happen autonomously in a computer. Only humans have been able to do that even up to today.

So the question is how do you approach this problem? One idea that we had was maybe we could cheat and create a system where, because humans do have this open-endedness property, we can actually let the humans drive the open-ended exploration process, but in an artificial system or an artificial space. You may say, well, what are you gonna achieve then? Obviously, if the humans succeed, you just cheated because it’s the humans. You haven’t actually programmed anything useful. But my thought was if the humans do actually create a new open-ended process inside of our system, we’ll learn something about open-endedness.

Like, if you think about evolution, we have to go dig in the dirt to get the fossils to actually understand what happened. It’s very hard. It’s arduous. It’s expensive. The history of ideas, most of it’s not even written. Like, we don’t know how somebody thought of the thing they thought of or what fed into that idea. So we don’t have all the information. If I could create some artificial autonomous open-ended process, then I could actually have all the information about how everything happened, and maybe we’ll learn some lessons.

That was the thought. So that’s why we created this thing called Picbreeder, which was a system for exploring the space of images. Underneath the hood of Picbreeder, there were actually little neural networks that encoded these images. It’s nothing like modern generative AI though. It’s not like DALL-E, where you have the single large model that generates huge numbers of images based on prompts and was trained on all the images in the universe. These were trained on no images at all, and they’re tiny neural networks. There’s no training data.

It’s rather what it is is a breeding system. So you basically look at a bunch of blobs and then you can pick a blob and say, like that blob. I want it to have children. And we’ll perturb the blob slightly by perturbing the underlying network that represents it, and then we’ll show you its children. So you’re basically just doing breeding, evolutionary breeding. Some people call it evolutionary art or genetic art.

But what we did that was unique is that we put that system onto the World Wide Web so that we could crowdsource the world and people could continue to evolve from what other people were able to discover. So you could find an interesting image, you could publish it back, then the neural network attached to that image would be published to the site with the image. Then someone else could find your image and then continue to breed it. And so you get a kind of artificial tree of life. A new phylogeny is emerging because of the people in the site.

And this worked—it was surprisingly robust that we really did, I think, get an open-ended process going, which is really an achievement because I think there’s almost no other example at all of an artificial open-ended artifact in the world. It’s like the only one, which means that it’s a very important treasure trove of data. I think it’s highly underrated how important this experiment is. Most people think it’s a toy. You know, it’s like you go in and evolve some pictures. It’s kind of fun, but it wasn’t meant to be a toy.

Inside of the system, we did discover in the end some very deep principles. So it was true that by implementing an open-ended system like this, we learned some really deep facts. The first one was not what this paper was about, but it’s important I think for context just to mention it. It’s already been spoken about for years and years. So it’s almost over-spoken about—this concept of the objective paradox, “Why Greatness Cannot Be Planned,” which is my book with Joel Lehman. This idea that sometimes you can only find things by not looking for them. Or in other words, sometimes having an objective can actually stop you from making discoveries, including the objective itself.

This is something that we noticed in Picbreeder. That’s where we got this insight from because we noticed that when people would find meaningful images, like people found butterflies, skulls, cars—this is really amazing when you consider it was starting out with random blobs. Like you wouldn’t necessarily predict this would happen. Remember, this is not a system that was trained on any data, knows nothing about our world. And people kept finding these, and these are extremely rare needles in a haystack of everything that you could find.

So it begs explanation. And what we saw was that when people found these meaningful images, they were not looking for them. Or at least most of the way along that trajectory of search or breeding, almost the entire trajectory, they were not looking for the image that was ultimately found. And it led to this understanding that sometimes it’s better not to have an objective in order to find interesting things. This is such a fundamental point that it’s been discussed, like I said, all over the place at this point. We wrote a whole book about this. And so I don’t wanna belabor that point about it, but it’s an interesting point that we found.

But there was another dangling lesson in Picbreeder that is much less known, less talked about, and basically almost nobody knew about it. But we noticed it and it’s very intriguing and interesting, which is that it was also the case that the underlying representations in the neural networks that represented these images were unbelievably good. Just crazy good. Like, they looked like an engineer had built these neural networks by hand.

So, like, I could find, inside the network for the skull, because there was a skull, somebody found a skull, there was a single connection in that network that would open and close the mouth of the skull. So it was like a modular decomposition of how you would think of a skull. You know, you can actually manipulate individual components in a modular way of this image, and they’re mapped to individual dimensions in the search space. And so this is like a remarkable thing. And we were seeing it over and over again in all these images that it has this kind of special representation, which we don’t see in normal neural networks. That’s not the kind of thing we see.

And so we also noticed that algorithms inspired by the objective paradox observation, like novelty search, also led to better representations, more compact representations. And it was years we spent just not really doing anything about this observation until Joel Lehman and I, years later, just tried just to see what would happen. What if we used SGD to try to reproduce some of the Picbreeder images?

Jim: For our audience, define SGD for the folks.

Ken: Right. So if you recall, SGD stands for stochastic gradient descent, which is the conventional training method for neural networks back then and today. We’re using SGD to train large language models today. It’s sort of like the golden solution to training a network that everybody’s converged on. It’s hardly questioned that we should use SGD—it’s basically just the only thing in town.

Everybody used SGD, except Pick Breeder didn’t use SGD. That’s what makes this interesting. Pick Breeder is like this wildly different way through evolution. When you would ask an image to have a child, it would just perturb weights randomly. There’s no SGD happening. It’s not figuring out the direction in weight space that would optimize it towards some goal—that’s what SGD does. It would just randomly perturb, and if you like it, you could pick it and it would become the parent of the next generation.

So we had these amazing representations that came from that really weird exotic process, but they begged explanation. Let’s try to see if we can reproduce it with SGD. We’ll take, for example, the skull and make it a target for a regular neural network—like a multilayer neural network or an MLP, some people would call it multilayer perceptron—and we’ll train it through SGD and see if it can match the skull. It turns out that if you train it with a big enough network, that’s really important. It has to be big, much bigger than the Pick Breeder network. But if you do that, it can almost perfectly reproduce any Pick Breeder image.

That wasn’t a surprise to us necessarily. The issue wasn’t “can I actually use these images?” The issue was, what does it look like underneath the hood? Does it look like Pick Breeder when you train these conventional networks to output the same images as Pick Breeder? We noticed that it looked nothing like Pick Breeder. It looked like absolute spaghetti—hairball, as you put it. It was horrible. That was a really intriguing discovery. I didn’t know exactly what it means, but it’s clear there are these two different types of representation, and there’s something much worse about the SGD one.

This was years ago that Joel and I made this observation. We never got our act together to write a paper about this, but it was always dangling in my head for years that this is very significant because it’s a story that says what is important in intelligence is not just where you get, but how you got there. That is not usually the story. We talk about where we got—like, we got to a point where the network will score some perfect score on this math Olympiad or some gold score. That’s where we got, not how we got there. But what we see underneath the hood is that how you got there has enormous impact on the underlying way that you got to that solution.

The question is, does it matter? How does it matter? How much does it matter? I was sort of sitting on this for years, not really digesting it or thinking about it thoroughly until maybe one or two years ago, it thrust back into my head. I started thinking this is just a shame that no one has actually seen this point. This is a very important issue. It needs to be explained and researched, and we need to know the implications for the field.

Think about it more intuitively, because a lot of this may sound like abstract mathematical stuff. Two people both score 100 percent on the math Olympiad or some crazy exam, PhD level exam as the frontier labs like to call it. Imagine you get two people that both score a perfect score. One of those people invented all the mathematics themselves and then scored 100 percent. The other person learned from textbooks and teaching and doing exercises. Do you think those two people are going to have similar mathematical careers? They both scored 100. Which one do you predict will be a better mathematician?

We see this literally in these math exams. It happens all the time that there are people who ace them and do really well and go on to do nothing of note in mathematics, and others who ace them and go on to become great mathematicians. In other words, the test is not distinguishing between these two archetypes. The difference between these people is underneath the hood. It’s not in the answers to the test. It’s something deeper inside of how they represent the world. Here we see evidence that how you represent the world is a function of how you got to your level of expertise.

That was gelling in my head, and so that led us to propose this hypothesis, which we called the fractured entangled representation hypothesis. This was an attempt to crystallize this observation in a hypothesis because before it was just an observation. We’re saying that this hypothesis is that the conventional way of training—when you’re training towards an objective with SGD or perhaps even with other algorithms that are objectively oriented—will, we predict, get you this kind of fractured, entangled representation underneath the hood. Therefore, representational optimism, the belief that things just get better with scale and data, may actually be worth questioning.

Jim: You were talking about scaling, right? This is the optimism aspect. Is there a belief, formally or explicitly stated, that scaling will not only produce better results, though at a logarithmic kind of rate, but will also produce better representations? And where does that come from if it exists?

Ken: I think that there is an implicit belief. I don’t know if I’d go as far as calling it a theory because it’s not often articulated explicitly. But we did try to do some digging as we wrote the paper to try to see the extent to which this belief has been written down, what we called representational optimism. We found some places—like, we quoted directly in our introduction. We quoted, I think, the original GPT-3 paper, which pretty explicitly states this belief. But especially a belief like this—there’s not a real theoretical explanation of why it would get better. I think what it is is just that given that performance results continue to improve, people just think it must be the case that representation is improving. And people think representation and performance are one and the same thing. So if you’re getting higher scores, you have better representation. It’s like no one would even question that. That’s just common sense. So I think that’s why you don’t see people really picking this apart or getting deeper into the question of what do we really mean when we’re saying that representation is improving.

Jim: Gotcha. So basically, it’s just an assumption that nobody has ever probably really tested, and this is the first test.

Ken: Yeah. Assumption’s the right word.

Jim: Think assumption. You know, we think it’s all good and wonderful, so therefore, all the wonderfulness must be correlated, right? Well, it could be anti-correlated. It could be the hairballs getting worse with every generation. And that’d be like two questions out I’m going to ask: Is there a way we could perhaps statistically identify hairballness? It should be interesting. Talk a little bit about a clever thing you did, which demonstrates pretty conclusively what you’re talking about, this idea of weight sweeps and what that was and what it showed.

Jim: Gotcha. So basically, it’s just an assumption that nobody has ever probably really tested, and this is the first test.

Ken: Yeah. Assumption’s the right word. Yeah.

Jim: Yeah, think assumption. You know, we think it’s all good and wonderful, so therefore, all the wonderfulness must be correlated. Right? Well, it could be anti-correlated. It could be the hairballs getting worse with every generation. And that’d be like two questions out I’m gonna ask: Is there a way we could perhaps statistically identify hairballness? It should be interesting. Talk a little bit about a clever thing you did, which demonstrates pretty conclusively what you’re talking about, this idea of weight sweeps and what that was and what it showed.

Ken: Oh, yeah. One thing that is really fortuitous about the fact that we noticed this in Picbreeder, which may not be obvious, is that because Picbreeder is set up such that you have neural networks that output two-dimensional images, it means that you can actually see the intermediate images that are being computed inside the network as it’s building up the representation. This would only be true for a network that outputs 2D images of this type. And so it allows us to literally see representation visually.

We could see these internal nodes and how they were decomposing, for example, the image of the skull piece by piece as you move up the network. And we could see that for both the Picbreeder and the SGD networks that we trained as comparisons. It basically leaves you with this hint that something might be wrong because you can see the contrast is stark. If you go to this paper and you see the highlight showcase image in the paper—let me see, it’s figure five—you’ll see that it compares the underlying representations of the Picbreeder network with the SGD network. You don’t even have to be a scientist or an expert. You just look at it and you know there’s a big thing going on here.

So it’s a hint that there’s an issue. But we wanted to understand more about the issue. We don’t want just this one hint—we want to understand what are the downstream implications of this underlying structure that we’re seeing, which looks weird in the SGD network. What does the weirdness mean? What are the implications?

One thing that you can do, which we knew you could do because of playing with Picbreeder networks in the past, is what’s called weight sweeps. You can take individual weights from one of these networks and just change the value of that single weight by a little bit, and then a little bit more, and a little bit more, and see how that changes the image. We’re sweeping across one connection in this giant neural network and seeing what is the implication of just sweeping across that one connection.

This is related to what I said about the fact that a single weight in the Picbreeder’s skull would open and close the mouth. We already knew this had this really cool property because that’s evidence of modular decomposition. You see modular decomposition if you have the ability to make changes that make sense inside of this image because of single weights.

We thought, since there’s so much evidence of modular decomposition through weight sweeps in the Picbreeder network, we can easily just depict those in the paper and that gives further evidence of how good it is. But then on the other side, let’s just do the same thing for the SGD network and see what we get. We weren’t sure what we would see because we’d never done that before.

The nature of the mess was a new thing. What you see is deformations that are terrifying, basically. With the Picbreeder network when you do these sweeps, something like the mouth gets bigger, the mouth starts smiling or stops smiling, or the nose gets bigger, or the eyes start winking—you see things that are still skulls. But with the SGD weight sweeps, what you see is a giant dent gets taken out of the cheek on one side. The eye disappears completely. The face is no longer a face—just absolute distortion.

It shows that it doesn’t have the decomposition. It’s just completely missing the structure of knowledge, the structure of reality. I say the structure of reality because, let’s use this as a metaphor for an LLM—it’s a complete impostor and facade. It has no idea what a skull is even though it looks like a skull on the surface. Underneath the hood, it’s not decomposed in any relationship to the skull. These sweeps help to support this view and make it harder for a skeptic of our paper to dismiss it easily. A skeptic would like to say, “It’s true it looks really weird, but it’s actually good. Just trust me, there’s something you don’t understand.” But then explain that to me with those weight sweeps, how that could be the case. These weight sweeps are damning and really concerning.

To add even more to why the weight sweeps are important, what they really show is the adjacent space around where you are in the search space. There’s this giant multidimensional search space that’s encompassed by all the weights in the network. What we’re doing when we perturb weights is we’re just looking at what’s nearby. And you’re seeing that this kind of envelope of possibilities around the skull—they’re all still skulls, as you would hope. You would hope that things near skulls look like skulls. That means there’s actually a principled structure to the space of possibilities that you’re describing. But in this SGD network, it’s not true. The things that are near the skull are not just not skulls—they’re not anything. They’re absolute garbage. So if you imagine search going through something like that—like, I want to go from the skull to something else—to continue to learn, I will have to wade through all that garbage to get to the thing that I care about. There’s no way around it, so it’s gonna add extra expense at a minimum.

Jim: Yeah. As we talked about before, if it’s gotten to a point that meets the formal requirement by seventy-seven kludges all added together, canceling each other out, the ability to do anything with that as a generalization is obviously not gonna be easy. I wonder—I assume it must be related, maybe it is. You tell me. I remember back in the day when the early GAN networks were being used for identifying images and such. And we often found that a network that would identify “cow, cow, cow,” but then you could adjust just a few pixels and it would say “dump truck.” Is that somehow related to the fact that the place in space is something that just happens to work from a bunch of cancellations and add-ons versus a principled structured place? Do you remember those old experiments where people would just do slight perturbations of an image and then the classifier would give some completely off-the-wall answer even though by eye, it looked almost identical to the original?

Ken: Yeah. Those are like those adversarial images. And I think it is related to this. It’s a symptom of fracture and entanglement that you would have these very bizarre discontinuities in your belief space. And this is like a single pixel can make you think that a line is a school bus. It makes sense. If you think about fractured and tangled representation, you would predict that. And so what you’re really making is a prediction about this opposite kind of representation. The unified fractured representation, you’ve just proposed a hypothesis that it will suffer less from adversarial examples. And so that remains to be shown, but it’s interesting because it follows from this that there’s all kinds of things. You can imagine all kinds of hypotheses about the benefits of this type of representation, and that would be one.

Jim: Let’s get to that next statement. This is a pretty small scale example on the scale of things. Like, how many parameters did the SGD model have, for instance?

Ken: Right. That’s a very good point and important as a concession that this is small scale. So this is why we cannot just immediately jump out and say LLMs are devastated or something. Like, this is a way smaller scale. So we’re talking about—there are several different images in the paper, so there’s not just one scale because each one has a different size, but roughly in the order of dozens to hundreds of connections, which is ridiculous. I mean, we’re talking about billions in the case of LLMs. So this is a totally different world in terms of scale.

And so it leaves room for speculation that somehow this gets fixed in this giant high-dimensional world of the LLM. But it also leaves room for concern that it’s not fixed. And so part of our paper, we were trying to give indirect evidence that there is evidence of FER inside of large models as well. We referred back to the literature. We also came up with some new examples to show evidence of fracture. It generally tends to be indirect because in LLMs, we can’t look directly inside the way we can with the Picbreeder networks because they don’t just output 2D images. So it’s harder to look at it in an internal hidden node and know what it is. But we can get indirect evidence. So we tried to do that and we did share a kind of a laundry list of indirect evidence of this fractured and tangled representation. But I would still say not proof, and also it doesn’t really give you a sense of the extent. Like, that still remains a mystery. Like, how bad is it? Is it as bad as what we see in these small Picbreeder images? That would be really bad news if it is like that. But we can’t say—we don’t have enough evidence yet to be sure that it is. And so it remains an open question like the degree of FER in large models today.

Jim: Yeah. As you mentioned, the fact that large general purpose models, it’s essentially impossible to understand layer by layer what’s going on. Did you guys give any thought to is there a statistic, a network architectural statistic that could be calculated for an FER network versus one of the unified networks? You know, this is classic work that people like Mark Newman do. Coming up with network statistics that says, here’s the hairball number between zero and one. And this one is a point one, this one’s a point seven, worse hairball.

Jim: Yeah. As you mentioned, the fact that large general-purpose models are essentially impossible to understand layer by layer what’s going on. Did you guys give any thought to whether there’s a statistic, a network architectural statistic that could be calculated for an FER network versus one of the unified networks? You know, this is classic work that people like Mark Newman do. Coming up with network statistics that says, here’s the hairball number between zero and one. And this one is a point one, this one’s a point seven—worse hairball.

Ken: Yeah. I don’t have an answer to an ideal statistic for this, but I’m being careful with my words because you can imagine some things that approach addressing this statistically. Like, for example, some kind of notion of compression can give you a sense of the efficiency of representation and maybe is indirect evidence of the unified factored type. And so you can imagine trying to create some measures like this.

But the reason I want to be really careful and not just say that’s it, is because I think it’s clear that this is a more nuanced and subtle issue than simply compression. Compression doesn’t tell you how you factored things out. This is clearly about more than just how compressed it is, but also what is factored out and how. And it might even be the case that in some factorizations, modular decompositions are actually less compressed, but they are better because they factor things out in a way that is more amenable to searching through interesting derivatives or creative artifacts that would be built on top of that thing.

So you don’t want to just oversimplify it to an issue of compression. Compression speaks to some degree to fracture because it means that you are reusing information if you’re highly compressed, which means you probably have less fracture. But it’s not the full story, especially because of the entanglement part of this. And so I don’t think we yet have a great measure of FER that can be generally applied.

I also know that some people are irritated by that. Like people who’ve read the paper or will read the paper are like, “Where’s the measure? You need a measure. What’s going on here? We’re scientists. We need objective measures.” I just want to say to that sentiment, which I think is legitimate, it would be better to have a measure than not. But I also want to say, like, please, if you look at these pictures, you don’t need a measure to know there’s an issue here. Don’t go so far with your measure obsession to just dismiss what’s standing right in front of your face. That would be missing the forest for the trees. The measure would be nice, but what is here is worth sharing now rather than waiting for a measure. That’s why we come out with it even though we don’t have a perfect measure. This is worth having this discussion. This is an important issue. Now feel free to go off and try to figure out a good measure.

Jim: I’m unfortunately gonna spend some time on that rabbit hole. By the way, for the readership, a link to the paper as always will be on the episode site at jimruttshow.com. And so even if you don’t want to read the paper, you could at least look at the pictures, and the story that he’s telling here is absolutely compelling. It’s like the difference between a lightning bug and lightning. It’s one of those kinds of qualitative differences. So, take a look at the paper. Even if you’re just gonna look at the pretty pictures, you’ll actually get something out of it. A dumbass question when you talk about compression: Did you try running LZ on the two networks and see which one had a higher compression rate?

Ken: No. We didn’t do that. I mean, I would have some questions about how to order the weights of the network before inputting it into the compression algorithm to get a meaningful answer out of that. But there probably is some way to think that through. And I think people have done things like that with grokking, for example. I think there’s been experiments to see if the level of compression changes after grokking has been observed in a neural network. And grokking is relevant to this paper because you might think that grokking is—

Jim: What is grokking in this case?

Ken: So grokking gets talked about in AI these days, which is like this idea that for a while networks—say a network is learning how to do addition. For a while, you’ll see it very idiosyncratically memorizing kind of random things like it knows how to do three plus five, but three plus six is wrong. And then suddenly this inflection happens where it suddenly shoots up in ability really fast. It’s like it grokked it. It just got it. And kids seem to do this too. It’s like suddenly the kid understands what addition is. Before that, it’s just like they don’t really get it even though they’re trying to remember what you’re saying.

This seems to happen in networks, which is really intriguing. And so there’s a lot of work on grokking and people trying to say what actually happens underneath the hood when it groks. And you might even hypothesize that grokking leads to you from FER to UFR, like maybe actually becoming more unified and factored after grokking. That’s just a hypothesis though. We don’t yet have evidence of this, but it might be the case.

Some people have been just independently interested in what the phenomenon of grokking means, and I believe that some of that work has involved trying to measure compression to see if there is some amount of compression that happens during a grok event. And I think there is evidence of that, I believe. I hope I’m not misstating because I’m not an expert on the whole literature, but I believe I remember hearing this.

So you might look to grokking as a potential remedy mitigation here. But one thing to remember though, I think is important for people who are just like, “Oh, this isn’t a problem. It’s just grokking. So everything will be fine. It’ll look just like picbreeder if we just let it grok.” Something that you need to keep in mind though is that it’s way better if when you learn, you end up with a nice representation than the way that things work is that you always end up with a horrible mess and then you have to clean it up. Like even if you could clean it up, it’s still really expensive, especially at a multi-billion scale. So it’s not necessarily the savior for this kind of thing. You could even imagine that at scale, you’re just gonna cross a threshold where it’s just intractable at the level of all human knowledge. And so FER will still reign supreme even though there’s some degree of fixing and cleaning going on. We don’t know the degree of any of these things in reality empirically. So it’s all speculative and very interesting to look into.

Jim: How would one think about or how will we find natural things that other people have done to scale up this result? For instance, there have been people who have used evolutionary techniques instead of SGD to train quite large nets. And there were a few papers a few years ago showing that at least for some class of problems, the performance wasn’t that different. Maybe evolution a little worse, but not terrible. Could you go look at those examples and see if they have a statistically different structure to them? But of course, you need the statistic first. Or could you take what you’ve done in a broader domain and scale up the approach and do it? I mean, this is as you say, this could be huge. Could we be ending up spending 80 percent of human electricity on detangling hairballs, for instance?

Ken: Right. That’s very true.

Jim: Yeah. It’s possible that this could be the implication of what you’re saying. This could be one of the most gigantic results ever in this space, or it might not. But if because it might be, it would seem like it’d be worth doing some fairly serious experiments to see. So if someone handed you a check for $200 million, what would you do to attack this problem?

Ken: Right. That’s very true.

Jim: Yeah. It’s possible that this could be the implication of what you’re saying. This could be huge. This could be one of the most gigantic results ever in this space, or it might not. But because it might be, it would seem like it’d be worth doing some fairly serious experiments to see. So if someone handed you a check for $200 million, what would you do to attack this problem?

Ken: Yeah. I mean, it is totally the right framing. This could be huge, but I also agree it could not be. It’s not proven yet. But for those who think that this might be huge, this presents a really cool quest to go down, which is the question you’re asking: how do we get UFR? It’s a systematic issue. Is there a method? Is there a way to encourage UFR in large neural networks?

If you basically don’t like the paper and think that it’s making a big mountain out of a molehill, obviously, you’re not going to care about this. But for those who actually believe this is something important, this is a real opportunity to do something very significant. I’m already getting emails from people saying, “Well, how do I get the network to make UFR?” I’m like, well, that’s basically the trillion-dollar question. If I knew that, I would just do that. But obviously, I have speculations.

There are two questions: Does it matter that much? And if it matters, can we do this? I think we’ll find out if it matters. And if it does, we will do it. I think it can be done. So the question is how? What we know right now is a couple things. We know Picbreeder did it—that’s the only known example of producing UFR explicitly. We know how Picbreeder works, so we know some of the components of the system. And we know that SGD on these image targets doesn’t do it.

The question is, what can we infer from that knowledge about making a more general hypothesis about how, using modern compute and network sizes, we can actually get things that have this property? We can speculate that the ingredients of Picbreeder might be critical. For example, Picbreeder is naturally evolutionary, so maybe we should look at that. It’s also true that Picbreeder is open-ended—there’s not a final objective. Maybe it should be open-ended rather than just aiming at one thing.

That’s maybe part of the explanation for why you have good representation, which by the way applies also to DNA. Evolution on Earth doesn’t have a single final objective, like “create a bug” or something. That’s partly why DNA has the nice properties that you alluded to much earlier in the conversation—it’s also open-ended.

There are all kinds of options on the table. There are regularizations. There’s NEAT inside Picbreeder, which means the networks are growing over time. That’s not usually part of the conventional way of training neural networks today. Is that essential to the fact that we ended up with UFR?

It could be that there’s a higher-level unifying explanation which is more general than Picbreeder itself. So you don’t even need any of these components at all—there’s a more general explanation which Picbreeder just happens to be an instance of. Then we would want to understand the more general theory, and we would be able to create many different types of algorithms that have the property of producing UFR in the end.

I think there’s a lot of options on the table, and we can draw inspiration from what we know right now, which is not enough to make a conclusive statement yet. But the fact that it does exist and did happen—that’s the miracle here. We’re not just talking about some theoretical idea. Literally, there are thousands of Picbreeder networks that have this property, and we know how it happened. We’ve seen it.

I think we’re going to figure it out sooner or later. I’m guessing there’s more than one way to do it. One reason is because evolution seems to give us that ability as humans. I’m guessing my brain has better UFR than, say, a large language model. It’s not a binary thing like you have it or you don’t—it’s like a continuum probably. If it was this special sauce, like one very specific thing, there’s only this one crazy obscure algorithm that produces UFR, then we would have had to hit an enormously improbable jackpot to have it in evolution. So it makes me think there’s probably an attack surface that’s a lot bigger. There are many different ways into something like this, which makes me optimistic that there’s lots of opportunity on the table right now. Many people can make important discoveries in this if you believe that this is a path worth exploring.

Jim: Well, I want to thank Ken Stanley here. As soon as I read this paper, I said, “This might be important.” And so I reached out to Ken and, like you, I’m not sure. But when you see something that could be this fundamental, it’s worth digging into. So thank you, Ken, for coming in here and explaining it very well, extremely well. People should take a look at the paper, those who have a little bit of background or just want to look at the pretty pictures. And I hope you got somebody’s attention here through the other work that you’re doing to help provide some funding to do this thing at scale because it’s worth a real effort. This could save unbelievable amounts of money, electricity, and slow down the heat death of the universe a little bit if we can find ways that are fundamentally better aligned with what we’re trying to do rather than creating unbelievable hairballs that by pure luck and brute force happen to work.

Ken: Yeah. And also, remember, it’s not just about saving energy, which is obviously very important, but better creativity, better continual learning, better generalization. There’s all kinds of possible rewards and dividends from doing this. So yeah, thank you so much for giving me an opportunity to talk about this here. I really appreciate this forum and getting the word out more. It’s been really great being on the show, and I like the technical level of the questions to give me a real chance to try to make this case. So I really appreciate it.

Jim: You did a great job too.

Ken: Thanks a lot.