Transcript of EP 281 – Jeff Hawkins and Viviane Clay on the Thousand Brains Theory

The following is a rough transcript which has not been revised by The Jim Rutt Show, Jeff Hawkins, or Viviane Clay. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guests are Jeff Hawkins and Viviane Clay. Jeff’s a scientist with lifelong interest in reverse engineering the neocortex. In 2002, he founded and directed the Redwood Neuroscience Institute. 2005, he founded Numenta where he led the research resulting in the Thousand Brains Theory of neocortex. Last year, he created the non-profit Thousand Brains Project to turn the neuroscience theory into a new sensor motor AI technology. Earlier in his career, he co-founded two companies, Palm and Handspring, two companies us oldsters remember well, where he designed products such as Palm Pilot and Treo, which one of the very first smartphones.
Jeff has written two books about the Brain on intelligence in 2004 with Sandra Blakeslee and A Thousand Brains: A New Theory of Intelligence which is what we’re going to be talking about today. Welcome, Jeff.

Jeff: Thanks Jim. It’s a pleasure to be here.

Jim: This will be fun. And we also have Viviane Clay. She received her doctorate in cognitive computing at the University of Osnabruck in Germany where she focused on sensor motory learning as a key aspect in artificial and human intelligence. At Numenta, she continues to pursue her interest in sensor motory intelligence and the brain, has been working on the Thousand Brains Project since its inception. Welcome, Viviane

Viviane: Hi there. Thanks for having us.

Jim: Regular listeners know I’m more or less obsessed with the brain and its various processes and possibilities and emergent effects, et cetera, so this should be a very interesting conversation. A lot of the work of Numenta and the new thing, which are called Thousand Brains Project, seems to have its basis in the work of Vernon Mountcastle.

Now interestingly, I read Mountcastle’s magnum opus Perceptual Neuroscience, I don’t know, around 2013, something like that, and it actually had a pretty interesting effect on me. And so I’m going to let Jeff and Viviane talk a little bit about Mountcastle’s work, specifically in the area of many columns and macro columns, and maybe you can distinguish a little bit on which of those you’re using in your work.

Jeff: So Mountcastle was a real key figure in neuroscience in the last century. His main focus was the neocortex, which is the big wrinkly thing you see if you opened up ahead. And it’s about 70% of the volume of a human brain. It’s part of the brain we all think about is intelligence.

My neocortex is speaking, your neocortex is understanding my speech and language and our understanding of the world and how we manipulate the world as an intelligent species. It’s mostly the neocortex.

And what Mountcastle had was a singular brilliant suggestion. I don’t know what else to call it. He pointed out that the different parts of the neocortex do different things, but the circuitry in the different parts looks remarkably the same. And he speculated that the neocortex got big in humans by just replicating a basic element over and over and over again.

And so he proposed that almost everything we think about is intelligence. Everything from our seeing and our listening, hearing and our language and how we manipulate objects with our hands, how we think about the world, how we make podcasts, et cetera. That’s all being built on the same basic algorithm that’s been replicated a couple several a hundred thousand times in your head.

And it’s kind of hard to believe at first, but we now know it’s absolutely true and our work has been trying to understand what that algorithm is. You mentioned the word column, Jim. That’s a term that’s often referred to. If you think about the neocortex, it’s this sheet of cells.

It’s the size of a large dinner napkin, maybe two or three millimeters thick. But these units of replication, you can think of them about a millimeter in an area. So it’s maybe like a little grain of rice. And so your brain is composed of maybe 150,000 of these little grains of rice stacked next to each other. Those are the cortical columns and each one does the same thing.

So we have many columns dedicated to vision and many columns dedicated to touch and so on. And our work has been to discover what those many columns do and how they do it in great detail, more detail than we’ll have a chance to get it today. But we’ve made a lot of progress on that.

And now Viviane is taking that neuroscience technology and we’re applying it to a new form of AI. It’s very different than the stuff that’s out there today. And I don’t know, Viviane, you want to make a few comments on that approach to AI?

Viviane: Yeah. So basically this pretty simple idea from Mountcastle that there’s this general computational unit in the brain is a really neat idea when you’re starting to implement an AI system because you can basically focus on implementing this general computational unit in a really nice way. And then you can simply replicate that unit many, many times, or you can use the exact same unit for a range of different sensors or actuators to interact with the world.

We call this unit the learning module, and we have that implemented in code. It’s open source and people can come and take a learning module and then connect sensor modules to it. You can connect a camera to it, vision, you can connect a touch sensor to it. You can connect sensors that humans don’t have, like a LIDAR or ultrasound.

And since it’s such a general algorithm, it should be able to deal with all of these modalities and model them in a general purpose way. And to do that, we put together this cortical messaging protocol, which is basically a communication protocol we define of what’s the interface of this learning module.

And as long as sensors plug into that interface, you can plug in whatever you want. You can scale the system, you can have learning modules talk to each other, you can stack them hierarchically to learn compositional models. There’s lots of possibilities.

Jeff: One way to think about that is we see mammals all having their cortex, and there’s a lot of variety in mammals. We have different types of eye systems and different types of sensors and animals have different… We have a lot of different sensors, humans do too, that we think about. But there’s a lot of variety in how animals interact with the world, different body types, different sensors, but they’re all based on the same basic algorithm [inaudible 00:06:24] thing.

It was like any animal that has a neocortex, it works the same way. And so you can have electric field sensors and you can have infrared eyes and you can do this and do that, and you can have bodies with different types of body parts in the whole system. So yeah. It’s like the variety we see in nature, we’re going to see that in AI as well.

Jim: The other interesting thing is that we know in humans that, for instance, if you have a deficit in one area, parts of the neocortex can be reprogrammed essentially. If you’re blind, parts of the neocortex would normally be used for vision, higher level vision processing often get repurposed for hearing or touch or both.

And then we have some interesting experiments where people have figured out how to have the thigh be able to see, quote-unquote, “using imaging technologies.” And there’s lots and lots more of those kinds of examples. So I’m sure you guys are far more familiar with than I am.

Jeff: Yeah. That’s more evidence for Mountcastle’s surprising discovery. People say, “Oh, that’s a visual cortex. It does seeing.” Well, if you’re blind, the visual cortex decides to do something else. It says, “Okay. I’m going to look in touch.” That’s this incredibly, powerful idea. And that’s one of the proof points in some sense for Mountcastle’s idea, which by the way, when there are still people today just can’t believe this is true, that there’s this one algorithm [inaudible 00:07:42], but we’ve made enough progress on this as Viviane says that her team have actually implemented this. We’re building this stuff.

Jim: I would say based on my reading, probably most neuroscience people agree sort of, but not necessarily in the strong form that you guys do.

Jeff: Right. Well, there’s always a gray area, right? Obviously Mountcastle said there’s this common algorithm, but nature tweaks these things, right? So if you look at the neurocortex, oh, you’ll see differences in different parts, but the vast majority is the same.

And so the key as a scientist, you want to figure out what is the commonality between all these things? And then later you can come back and tweak it a bit. So there’s a big difference between there’s some extra machinery and primate visual cortex that you don’t see in non-primate visual cortex that you see that in monkeys and humans, but you don’t see it in dogs and cats and rats. But they all see.

And so that extra machinery isn’t really essential for seeing. It might be essential for some of the things we do. That’s the big example that people usually use in the cortex of these extra layers and V1, this is part of the neurocortex.

So we want to understand that, but that doesn’t hold us back from Mountcastle’s idea. So if you want to look for the differences, you’ll find them. But if you want to look for the commonality, you’ll find them too. So we’ve looked for the commonality. I think that’s obviously the place to start. Understand the basic thing that’s going on everywhere.

And even in our work, even Viviane’s team, we can tweak this stuff. We’re not modeling biology 100%. If we were doing an infrared sensor or LIDAR sensor, we might say, “Oh, let’s do a little something tweak on it here, tweak on the idea.” But it’s the same basic idea just you can tweak it a little bit.

Jim: Yeah. The other sort of base idea we should lay on the table before we dig into more specifics, you mentioned layers a few times. It’s generally known that the cortex consists of six layers. And as I recall by Mountcastle, he actually specified different communications, modalities and processes going on in the different layers.

Was it layer five that was used for long range communication? I don’t remember the details, but maybe you could lay out for me the layers and what they do in some sort of general sense.

Jeff: Right. So this is a very complex topic, but the reason they say there’s six layers is back in 1900, 120 years ago. So when they first could look at this stuff under a Microsoft, they saw layers and some people said, “I see six layers.”

We still use that terminology, but it’s really incorrect. What they’re basically saying within the thickness of the neocortex, which is about three millimeters. You see different cell types and different cell bodies and different cell sizes, and they form this laminar structure, but there are many more than six cell types.

And so now sometimes people break the layers into parts like layer six is actually got five different types of cells in them, and layer five is two different types of cells. And sometimes people say layer two and three or one layer.

So this layering idea, this is a rough number of a roadmap, but what’s really important is identify the different cell types and what they do.

Now we can say what they do. What neuroscientists usually talk about what they do, they’re usually saying, “Oh, well, these connect to these cells and they refire under these conditions.” But very, very few people have any idea of what the functional aspects of these cell types and how they work together to actually create what we conceive as intelligence.

This is what we have done. It’s pretty unique actually. And so when we talk about a cell type, like Mountcastle might say, “Oh, layer five cells are motor outputs.” But we’re asking, “Okay. What’s the representation of that motor output? How’s it dealing with changes in orientation of your body to the world? What kind of models do you have in your head that generates these motor outputs?”

If I want to touch the power switch on my smartphone, I have to know where my finger is and how to move it to that switch. And how does the brain calculate that? So these are the things that most neuroscientists don’t deal with. But anyway, you can think of these cell types and these layers and they have this incredible complex connectivity to other cell types and other places of the brain.

So our team, as you know, Jim, there are thousands and thousands of neuroscience papers that have talked about these topics generated over decades of research, and it’s a very daunting field to get expert in, but you can if you spend a lot of time reading these papers.

And so now in our team, we have a sense of what these layers do. We have proposals. We have actual very specific proposals of what many of these cell layers are doing exactly, actually, and how they work, what their function is, how they work together to create our perception of the world.

And so that’s part and parcel of what we do from a neuroscience point of view. And from an AI point of view, we have to emulate those functions.

Jim: So essentially you guys have been working on how to reduce to a formal description, what’s going on in these micro columns or actually in these columns, these aren’t Mountcastle’s micro columns, these are the macro columns.

Jeff: There is a very confusing terminology here, if I can avoid, I usually do, but you brought it up. So here we go.

There are two types of columns, at least two types of columns that people talk about. The two major ones are, I’ll just call them columns and mini columns. Sometimes they’re called micro columns, but the same thing as a mini column. If someone wants to visualize this, a listener wants to visualize this.

A mini column is a very, very skinny little thing, and it’s like a teeny little thread that goes vertically, goes across the three millimeters of the cortex. It contains about 100 to 120 cells. They are part of the genesis of the cortex, when you’re in your mother’s womb and your brain is growing, the cells form along these little strings and it’s part of how the brain develops.

And sometimes they’re visible. So you can look under a microscope, you can see these skinnies, they’re like 30 to 50 microns wide. That’s like 30 to 50 thousandths of a millimeter.

Mountcastle said, “That’s the primary replicable unit of the cortex.” But then he hang on to say, “But that’s not functional. You have to take a bunch of the mini columns together, several hundred.” And that makes a larger column. And that’s the column we were talking about. That’s what Viviane was referring to as a learning module.

That larger column contains several hundred mini columns. It contains maybe 100,000 neurons or 50,000 to 100,000 something like that. And that, Mountcastle said, “Together, those mini columns work together to do something important.”

So the mini column, it’s like the basic unit of a column, but it doesn’t do much on its own. You have to have a bunch of them, and then you have a column then that does something. So it’s a very confusing terminology, especially if you read the neuroscience literature. They use different terms for these over the years.

Jim: So to be clear, when you’re talking about columns, you’re typically talking about the macro columns?

Jeff: Right. The larger column. Sometimes we referred to as a macro column, but most people just call it a column. A column by Mountcastle’s definition, imagine the pattern of information is coming from your eyes, where your eye has a retina and the retina is really big.

It looks at the entire visual space in front of you. A column looks at just small patch of that. So the input to one column is one small patch of the retina, or if it was coming from your skin, the input to a single column is a small patch of your skin.

And then what Mountcastle showed is if you move across the cortex, the next column over gets input from a different patch, and the next column over gets input from a different patch. So in each column is getting information from a small part of your sensory world. It’s like looking at the world to a skinny straw or just touching it with just the tip of your finger.

And what we’ve discovered is that the way these columns work is they can integrate or collect input over time. So if your finger’s moving and touching something, the same patch will get different inputs as it moves, but the brain keeps track of where it is in space.

So even a single column as it moves to the world, sensing different things can build models of large three-dimensional structures. So you could learn the shape of an object just by moving a single fingertip around on that surface of the object. And that’s what a single column does.

And the same thing happens to a vision. You just imagine looking at the world through a straw, you can’t see much, but you can still learn what the things look like just by moving the straw around. That’s what columns do.

Jim: Now as you both know, in fact, conversation we had many years ago, vision and the visual stack is probably the most heavily studied part of cognitive neuroscience, and we know we have a V1, it’s V2, et cetera, signals are going up the stack. How do you fit Mountcastle’s columns into thinking about the classic visual cascade? Where does it fit or doesn’t it?

Viviane: Similar to how you could plug any type of sensor into a cortical column or a learning module, you could also feed the output of one column into the input layer of the next column. So that’s how the hierarchy in the neocortex is typically defined is that output from layer three of one column goes out and goes into layer four, which is the typical input layer of a cortical column, one hierarchical layer up.

And there’s a certain hierarchical organization like that in the neocortex, and we also allow for that in our implementation, but there are also a lot of connections that are not strictly hierarchical like that. We have direct sensory inputs to higher regions in the hierarchy.

You have lateral connections between different modalities or neighboring columns, and you have also feedback connections that bias the lower level columns, again, what they should be expecting and what they might be sensing? And then you have motor outputs from every column as well, but that’s another big topic.

Jeff: In our world, our understanding now of how the cortex works, even a primary input layer. You mentioned V1, Jim, which is one of the regions that get input from the retina. By the way, V2, which is hierarchically higher than V1, also gets input from the retina. Viviane just mentioned it. These units on a parallel.

It’s typically assumed that in the classic view that V1 calculates the little features and then it passes those little features into the next region with bigger things and so on. The Thousand Brains Theory, one of the great things we discovered is because the sensory motor integration, even a single column, even V1 can learn complete objects.

There’s a limit to it, but they can learn because they integrate movement over time, sensations over time. So the output of a column and output of a region are complete objects. And so when you go up the hierarchy, you’re learning the compositional structure of the world.

Objects composed of other objects. Your listeners right now, just look around whatever room they’re in, they see objects in relationship with other objects, chairs relationship to the tables, lights relationship to other things, words relationship to pages on a book. The whole world is structured this way, objects as part of other objects.

And what the hierarchical connections that neuroscientists are all familiar with are really not about building up, getting to some point where you have a final model of something. It’s really learning objects composed of other objects, and there are models of complete objects everywhere in the cortex, every column, every way, every region.

And this is flipping around the whole view about how cortex functions hierarchically, but there’s tremendous evidence for this. Once you understand how it works, you can find all this evidence for this. Yeah. This is what’s going on. All those connections Viviane just mentioned all can be understood in this context.

She and I are writing a paper on this right now going into detail about how these connections work, the hierarchical connections and the non-hierarchical connections.

Jim: Yeah. I know there’s considerable results coming in that along the lines you were just talking about. There’s a surprising amount of long-term perceptual memory. It used to be thought that all memory of objects and actions was all in the episodic memory system, but apparently a fair bit is stuck out there in perceptual memory.

Jeff: Right. Right. In fact, again, we’ll say even the lowest levels of the cortical processing, not only do they model the world, but this is another fact. Every column in the neocortex, even the columns in V1, these are the primary columns. They have a motor output.

One of the cells types in layer five is a motor output. That is the motor output of the cortex. Every column in the cortex, as far as we know, has a motor output. So even the lowest levels are generating behavior, understanding models of the world.

The term we’re using in this paper is called heterarchy. It’s hierarchical in one way but it’s non-hierarchical to another way. It’s parallel in the sense that all these different parts of cortex are modeling objects, but it’s hierarchical because the hierarchical relations define this object is composed of other objects.

So it’s a different way of thinking about the brain that people have traditionally, but you guys you point out, there’s a lot of evidence now that you know what to look for, and people have been discovering this.

Jim: Yeah. It’s quite interesting, and I can easily see how it could be very useful in thinking about AIs, right? It actually makes more sense in some sense. We’ll get to that later.

Now we’ve laid kind of the foundation. Now let’s go to the next step, which is a concept that you talk quite a bit about is reference frames.

Jeff: Yeah. Well, one of the things we realized is that when you go about your world, your brain is constantly making predictions. In fact, my first book on intelligence was all about that. Most of these predictions you’re not aware of. You’re just not aware that you’re expecting when you grab your coffee cup that your brain is predicting what every finger is going to feel.

And when you look at something, your brain is predicting every little detail it’s going to see. But if something changes and you know it changed, that means there was a prediction. Anyway, so as we thought about prediction, what we realized that there are many types of predictions you make that can only be made if the brain knows exactly where the sensor is.

When you grab a coffee cup with your hand, it has to know where the fingers are. In fact, it has to know where every patch of your skin is relative to that coffee cup for it to know that it’s a coffee cup and to know if anything was different.

I’m holding a coffee cup right now, and if there’s some different shape or different feel or maybe it was a little edge on it that I didn’t… I would notice it. And so my brain has to know where the fingers are, where all the sensory patches are, and how could it do that? That’s really kind of crazy thinking.

How would the cortex know where all your skin patches are and where all the parts of your retina are looking at in the world? Basically, to know the location of something, you have to have some way of representing location, and a reference frame is the term that’s often used for that. You can think of a reference frame like our Cartesian coordinates, we all learned in high school like X, Y, and Z. That’s a reference frame. So if you know X, Y and Z, where something’s located relative to something else.

Well, it turns out, we said, “Well, how could neurons do this? They must be doing this. In fact, it must be happening at every cortical column.” We deduced this, and so like, “Wow. How could that possibly happen?”

Well, fortunately, some other people have discovered how. In a different part of the brain, there’s these things called place cells and grid cells in another part of the brain that were discovered back in the ’90s and early 2000s, and we know a lot about how they work. And they implement a type of reference frame.

That reference frame is specific mostly-

Jeff: … they work, and they implement a type of reference frame. That reference frame is mostly used for knowing where your body is in an environment, like just knowing where I am in a room, even if it’s dark, I can just know where I am. So, we then speculated that the same mechanism that was evolved to figuring out where your body is in a room or an environment which has been well studied, that that same mechanism probably exists in the cortex, because it’s a complicated mechanism and nature’s going to evolve at once, not going to do it twice. And so, we guessed that in the cortex it would be the same type of cells. Cells that are similar to grid cells and place cells, and they implement a reference frame, it’s very unusual how neurons do this, but they do it, and now there’s lots of evidence for that.

So now, people discovered these grid cells and place cell equivalents in the cortex everywhere they’ve looked. So, I think that’s going to be correct. So, anyway, reference frames are essential for knowing anything. When you look at the world, everything has a place, everything seems at some distance, in some location. That’s because your brain internally has representations of those distances and location, and then one thing that we learned, and this again flows from Mountcastle, that all knowledge, everything we know about the world, everything is stored in reference frames, and so these reference frame provides structure for knowledge. It’s easy to imagine what the structure is when I’m looking at a coffee cup, oh, it’s the structure of the coffee cup, it tells me there’s a three-dimensional model of the coffee cup in your head, I need a reference frame for that. And I need maybe a reference frame for figuring out the map of my house.

But it turns out that if we believe and follow what Mountcastle said, that reference aims at how we represent all knowledge. Everything we know, even things that are conceptual, like mathematics and so on, are represented in referencing. The neuroscience tells us this. It tells us this because the neurons look the same everywhere, the structure of them, the things we’re doing, and these reference frames exist everywhere. So, this is a whole way of understanding what knowledge is, knowledge is structure. It’s structured in these sort of three-dimensional reference frames, they don’t have to be three-dimensional, they can be two-dimensional or n-dimensional, it turns out, but they have this structure to them. And so that was a big, big insight.

And as we build AI systems, the systems that Vivian and her team were building are essentially, they represent knowledge about the world in these reference frames. And this is quite different than the AI that most people are familiar with today, which have nothing like this. Those are more statistical type of problems. So, we’re building the first AI systems that literally have knowledge of the world, that represents the structure of the world, and it’s not just statistical hacking of lots of data, which is what AI mostly is.

Jim: And just as something I’ve been trying to encourage, it’s let’s not use the word AI to refer specifically to large language models or deep learning and reinforcement learning more generally, that is its own sub-discipline, people. So, don’t say AI when you mean that.

Jeff: I see. Well, okay, because most people, many people don’t know what the difference between deep learning and transformers and convolutional networks and things like that are.

Jim: The takeaway is that AI is way broader than just the things that large language models are made of.

Jeff: Right. But all different flavors of those things. You mentioned, Jim, none of them, as far as I know, have this concept of knowledge represented in reference frames. And by the way, when you have knowledge represented in reference frames, you also have to have movement or something like movement because you have to move through the space. And so, the Thousand Brains Theory and the Thousand Brains Project, the AI we’re building isn’t completely about representing knowledge in these reference frames, but also about systems that have to move through space physically or even conceptually, it’s about this movement and reference frames and structure which don’t exist pretty much in any of these other AI techniques.

Jim: Interesting. One could imagine [inaudible 00:26:53], copying, convergent evolution, however we got there, the place in grid cells from the hippocampus into the cortex, but taking the next leap to we’ve encoded all of our knowledge into that same geometric-like architecture stretches the brain quite a bit, at least my brain. Can you speak to, at least a little bit, how we might use something like a geometric-like encoding for things like, say, syntax of language or even more simple, the browsing behavior of a deer, or something like that?

Jeff: Yeah. Well, I think I’m going to extend your question and make it a little bit harder.

Jim: Okay, even better.

Jeff: What are things about democracy and mathematics and things that we think of purely conceptual sort of fields of intelligence, how do they relate to a sensory motor type of system that’s physically moving through space and reference frames? And this is an interesting question, and I’ll be honest, we don’t have the answer to it, we have parts of the answer. But what’s really interesting is I mentioned earlier the parts of the neocortex that do those things, they look almost identical to the parts of the neocortex that do vision and touch and hearing, and they have the same reference frame structure to them. So, we take it as a given that those conceptual parts of knowledge, that’s how they’re represented. Then we start saying, okay, how do we understand that? In my more recent book, I talked about it a bit, I gave examples for mathematics and history, about how we might use reference frames for understanding these concepts.

History is pretty simple, you can think about, when we think about history, we tend to organize it in different ways, we might think about it where things occurred in space. So, what are the history of this area, of this region, where were these different towns that interacted and so on? You might think about it, that’s the type of reference frame. You might think about a reference frame of time. What was the sequence of events in time? That’s another type of reference frame, that this led to that, led to this, led to that. And in mathematics, you could think about various concepts as locations and space, and then operations you do on, or movements, or apply a Fourier transfer to something, that’s a type of movement.

I move from one space of mathematics to another space of mathematics, and if I’m familiar with it, I know what’ll be there after I do that transform. So, we are still trying to figure this out, but it’s clear this is how it works. And so, we’re going forward, we’re starting off with the basics of a system that can interact with the world and understand models of objects and things, and how to manipulate things to achieve goals. And I’m confident as we go along, we’ll figure out how the more conceptual things are also implemented this way.

Viviane: And yeah, I guess also on the point of language, because I feel like that’s a pretty nice property of the system as well. For one, you mentioned syntax, so syntax is pretty much just defining where words should be relative to each other, which is using a reference frame, but then the really nice thing is for semantics, so what do words actually mean? The brain can use long range connections between models that were learned, for instance, in visual cortex or auditory cortex, when you read a word, you can hear how the word sounds, and you can imagine what that word refers to. So, the language is automatically grounded in other models that you’ve learned with other modalities. So, it’s not like large language models, where you start with language and you only give it language, and you just find statistical patterns in that, but instead it’s you actually start interacting with the world, and language is pretty much something that comes actually quite late in development, and is then much more grounded in previous experiences of the world.

Jim: I probably shouldn’t do this, but let’s nerd out a little bit further on where neuroscience and cognitive science come together. We talked, again, about the analogy from place cells and grid cells, and I don’t remember which is which, but one of them is essentially fixed in the actual Euclidean universe, and the other one is relative. I think it was place cells are fixed and grid cells are relative. Is there an analogy to that in this idea of reference frames as ways to represent non-geometric information?

Jeff: I’m not familiar with that way of thinking about grid cells and place cells, I’m very familiar with grid cells and place cells, I haven’t heard that language used for that. So, I’m trying to compose an answer, I don’t know if Vivian has one.

Jim: And maybe I’m wrong, that was my reading from 10 years ago, and I recall extracting that pattern that-

Jeff: Well, I’ll tell you what, here’s another way of looking at it. Grid cells are, they represent, they’re closer to our pure reference frame, that they represent the metric space of something, and they’re a background grid, if you will, over some space or physical space, let’s say. And place cells are very specific to particular objects, particular things, particular environments. So, a place cell will fire whenever you’re at some location in a space related to some physical feature or something like that. Where grid cells fire, they’re sort of independent of the features, it’s just like a background reference frame. The very simple way to look at it is, when we learn something, we learn it by associating physical, or something that’s observed at some location in space. So, the very simplest way to think about it is place cells represent, in some way they’re more driven by the physical things in the space, but the underlying grid is less related to that, it’s more just like a grid.

So, if I had an environment and then, let’s say I went into my dining room, and I moved the features around in the dining room, there’s a lot of cells like place cells, that’ll change, but the grid cells won’t. They’re like, yeah, it’s still the dining room. Yeah, you can move the chair over here, you can turn the table this way, but it’s still the dining room, but our brains have to be able to know, that they have to say, this is the same place, but the things in it now have moved. And by the way, that’s a lot of our work right now, we’re focused on how objects behave, like if you want to manipulate something using a tool, we have to learn how things change over time.

If I use a simple tool, like a stapler, it has physical movements, it doesn’t always look the same, and it does things, but it’s still the stapler. So, in that sense, there’s a grid of reference frame for the stapler, but the different components of the stapler can move relative to that reference frame in different states, and as it’s being used. So, that’s a very rough analogy between grid cells or the unchanging grid reference frame, and then the place cells are more things that are related to things that are in there. It’s not as simple as that, but…

Jim: Okay, that’s helpful. You talk in the book about how often multiple columns have to be involved to do something, and that they interact via voting or something like voting, could you talk about that a little bit?

Jeff: So, the basic theory, and Vivian alluded to this earlier when she talked about this communications protocol, the basic theory is, each column on its own can learn stuff, and each column is looking at a different part of the world, one column might be looking at the input from my fingertip, and the other one, a different fingertip, and the other one was part of my retina. There’s hundreds and thousands of these columns, so there’s a lot of them. And they’re all building models, so where does the model of an object such as a coffee cup exist? Where is it? It turns out it’s in a lot of places, there’s lots and lots of models of a particular object, like a coffee cup I’m familiar with, I have models for my touch sensors, I have have touch models, I have visual models, I might even have models of how it sounds.

And so how do these talk to each other? They’re like, you don’t feel like you’re 150,000 little people in your head, which is what it is in some sense. So, they work together in various ways. In my book, I only talked mostly about one way, I called it voting, which is like they all try to reach an agreement about what they’re sensing. So, you’re looking at something and maybe it’s partially obscure, you’re touching it, maybe there’s some unusual things, it makes it difficult, but you’re only touching little pieces of it, maybe only a few fingers. How do you just instantly know what it is? All this partial input, nobody’s looking at the whole thing, well, they can vote, they can send a signal out… They do send a signal out, and it says, hey, what’s the most likely hypothesis we have for what all of our inputs are at this point in time?
And that’s how we reach a consensus very rapidly, and that’s why we feel we have a single perception of, oh, I’m holding, looking at this coffee cup, it doesn’t matter if I’m looking at it now, or looking at [inaudible 00: 35:46], or touching now, it’s just still, it’s a coffee cup, because all these columns of voting, saying, yeah, here’s a coffee cup, that location, I’m touching this little partial of it. And then, this is a large part of what we were doing in the Thousand Brains Project, given maybe you could translate that voting into the communications protocol again.

Jim: Viviane, let’s extend that into the AI realm.

Viviane: Yeah. So, basically, like Jeff already explained before, each critical column can learn complete models, so we designed this implementation that we call a learning module that’s analogous to the critical column, and each learning module can learn models of complete objects, and the learning module on its own can recognize complete objects, but in order to recognize them, it has to move its sensor over the object. So, it has to, if I’m sensing a coffee mug with only one finger, I have to move my finger over the coffee mug, and over time I can recognize that it is a coffee mug. But if I have multiple sensors, so like I’m grabbing the entire coffee mug with my hand, all five fingers are touching it, then these five sensors could connect to five learning modules, and each of these learning modules would have a hypothesis about what are the sensing, and then they can communicate these hypotheses with each other using the critical messaging protocol, and reach a consensus faster. So, using more learning modules basically speeds up inference.

Jeff: Jim, if I go back to, I know you have a history in this stuff, but interesting, one of the, early on in my career, when I was working on the brain, I would argue with people about the importance of movement, that you can’t really understand how brains work, if you don’t understand how brains move in the world. And vision researchers would come back and say, well, look, I can flash an image in front of your eyes, and you don’t have time to move your eyes, and you can tell me what that image is, therefore you don’t need movement. And I didn’t know how to reply to that in the beginning, but now I do exactly know how to reply to it. First of all, you can’t learn an object without moving your eyes, you can only infer it or recognize it.

And the way we can do it in a flash is what Vivian just said, is that our models are all learned through movement, but now we have a whole bunch of models, and if you flash an image in front of my eyes, those models each get a little piece of it, and they vote, and therefore, voting allows you to recognize things quicker. But if I told you you could only look at the world through a straw, you’d still have to move the straw around to see what it is. Whereas, Vivian said, if you move your finger around it, you have to do it that way. So, people used to say, oh, vision’s not a movement sensor, and oh, it absolutely is, we move our eyes three times a second, and we’re constantly… This is not an inconvenience, this is how the brain learns, and how it recognizes things, and how it works. It’s not like, oh, we move our eyes, we have to sort of compensate for it. No, it is how the brain actually understands the world is through movement.

Jim: I like that because the question of how in the world did we integrate the visual inputs across saccades, right? Was always a huge question. And I never heard of a reasonable answer, this at least is pointing towards a reasonable answer.

Jeff: I’m being bold, Jim, I’ll say this is the answer. There’s so much evidence now, if you know what to look for in the literature, there’s so much evidence for this. But it logically makes sense. There was these puzzles that people really puzzled, not only do you have to integrate information over saccades, but the brain often would anticipate what it was going to see before the eyes actually got to where it was going to be. So, as your eyes are moving rapidly to fixate at a different location, the cells that were going to get input start firing correctly before there’s any input. And people say, how the hell is that going to happen? Well, the answer is that the brain knows the movement, it knows where in space it will be after the end of this movement. It says, I can integrate, path-integrate through time, if I move in this direction, I’ll be at this location, and the column says, oh, I know what’s supposed to be at that location, I’ll predict it. And so, your brain predicts what it’s going to see before the eyes even stop moving.

Jim: And every once in a while it doesn’t, and of course, this goes back to work in your first book, the columns are perpetual predicting machines, but sometimes they say, whoops, that’s not what I predicted, what the hell do I do now? Right?

Jeff: What that tells you is that your model of the world is wrong. You basically have a model of the world, and if some prediction is incorrect that says, hmm, that’s different, that’s not what it’s supposed to be, either my model’s wrong, I didn’t learn this correctly, or something’s changed. And from a biological, from an animal’s point of view, that’s a dangerous situation, or maybe an opportunity. But if you have a misprediction about something you know, your attention is drawn to it immediately, it’s hard to not pay attention to it. It’s like, hey, wait a second, that’s not right, maybe something dangerous is happening here, that’s what evolutionary brain would be saying.

Jim: Let’s turn this around though on probably the most cliched psychology experiment of all time, which is the attention blindness experiment, where four people are told to pass a basketball around back and forth amongst them, and count the number of times that they each touch the ball, or something, and the experiment has somebody coming through in a monkey suit, an ape suit, walks through the group of people passing the ball around, and 50% of the time the participants don’t even remember. How do you correlate your model with that well-known result?

Jeff: It’s pretty easy. We have this perception that we’re seeing all the world at once, and the reason we have that perception is because just like I was talking about a moment ago, you can look someplace and you know what you’re going to see. So, it’s like, oh yeah, this is over there, this over there, this over there, this over there. But in reality, we’re not seeing the world all at once, you’re only seeing a part of the world at once. And the model lets you have this perception that it’s all there in your head because if you want to think about it, then you’ll know it’s there.

It’s like I have a model of my, I’ll get to the basketball example in a second here, but I know what my house looks like, but actually, I can only visualize one part at a time. I can’t visualize it all at once. It feels like I know the whole thing at once, and if I walk into a room, I go, I’m seeing the whole room, no, you’re not. You’re actually seeing a part of the room and the rest of it’s in your model, and you turn, there it will be. So, in this example where you brought up, where there’s people playing basketball, and they’re passing a basketball around, and the gorilla walks through, a person in a gorilla suit walks across, stands in the middle of the room, thumped his chest, and walks off again, you’ve been instructed to count the number of times the ball has been passed, so you’re following the ball every step of the way.

You’re only looking at the ball, you’re actually not looking at anything else. And so, there’s no part of your model that says, there’s a gorilla in this room. So, you have no expectation there’ll be a gorilla in this room, and so you’re only really looking at the part of the room where the ball is, and there’s nobody looking at where the gorilla is. Nope. Even though it’s coming into your eyes, clearly it’s coming into your eyes, that’s not how it works. The eyes are focused on one particular part of the world at a time, and most of the rest of the visual field is just ignored.

Jim: Just an empirical question, has there been any follow-up work with eye tracking technology to see if the saccades actually don’t go to the ape?

Jeff: I don’t know about that.

Jim: I don’t either, I’m going to have to go look that one up.

Jeff: I would be very surprised if saccades went to the ape. This only works if you give the person this instructions to count the number of times the ball has been passed, and that requires constant vigilance of following the ball. You don’t have any time to do anything else, and so if you didn’t tell the person, count the number of times the ball’s been passed, and said just watch this video, everyone would see the gorilla, of course we would, there’s no question about it. It’s the fact that we’ve told you don’t look at everything, just look at the ball, and then it’s easy to understand if you do that. It’s this idea that we perceive the whole world at once, we don’t.

Yeah, again, going back to touch, it’s really interesting. If you pick up this cup, again, going to pick up the cup with your hand, you’re actually only touching a very small part of the cup, yet you perceive the whole cup is there, but you’re only touching little pieces of it. So, this is a good example, you perceive a room, but you’re only visually attending only small parts at any point in time. And the rest of it, you think is there because if you went and look, it will be there.

Jim: Yeah, we’re sketching it all in, right?

Jeff: Well, yeah, but you’re not sketching it in, it’s not represented until you actually move to that point. It’s like you… Or visual, imagine that point. It’s not all there at once, it’s really, imagine this one column looking at something. It says, oh, I know this is a coffee cup, it says, well, if I go here, I’ll see the handle. If I go here, I’ll see this. If I go here, I’ll see this. Oh, I think it’s all… But it never perceives the whole thing at once. It never touches or senses the whole thing at once, we just have a perception that it’s there because the model is invoked, the grid cells have been invoked. Yes, there’s a coffee cup here, and you can sample it as you want. It takes a little while for most people to get used to this idea, but-

Jim: Yeah. It’s definitely… In fact, let me have you guys respond to the more traditional model of, I guess the famous example is the supposed existence in the parietal lobes of very specific object neurons, right? The Jennifer Aniston or Bill Clinton neuron that in open brain surgery, if you stimulate one specific neuron, or at least a very tiny little cluster, way smaller even than a micro column, suddenly the idea of Jennifer Aniston or Bill Clinton are both together, it’s a good scary thought, come pounding into your brain. How does this far more holistic idea about objects relate to the known behavior of parietal, used to be called object store or something, right?

Jeff: Okay, so let’s get a couple of facts about brains. Brains are made of neurons, right? Human neocortex is 18 to 20 billion neurons. When you look at how the brain represents something, it always does it, taking a set of neurons. Now, [inaudible 00:45:41] I’m going to say there are no neurons that just do Jennifer Aniston, I’ll just state that, we’ll come back to how/why it feels that way. So, you might represent something with several thousand neurons, this might be a layer of cells in a column, might have 5,000 cells in it. And the way the brain does this is almost always using what are called sparse representations, meaning, if I take 5,000 neurons, maybe…

Jeff: … presentations, meaning if I take 5,000 neurons, maybe only 2% at any point in time will be active. So that means a hundred of those 5,000 will be active at any point in time. So most of the neurons are silent all the time. This is a known fact for most parts of the brain, the neocortex at least. And so that’s how the brain represents something. It might pick some small percentage, one or 2%, out of a big population. And it’s that set of active cells, which represents something. This means if I were to look at a particular neuron and I show lots of different things, most of the time it’ll be silent and occasionally it’ll be active. But it doesn’t mean that neuron only represents… It’s the only thing that represents that. It’s always a population of cells that represents something. Now, when we look at the Jennifer Aniston neurons and the Bill Clinton neurons, it’s a little bit more subtle than you just described it.

They don’t respond in all situations of Jennifer Aniston and Bill Clinton, but many. So sometimes they don’t. Other cells respond to different presentations to them, and those cells will also respond to other things that it’s hard to know what input you have to give to the brain to get that cell to respond to something else. So just because they show it 50 celebrities and it only responds to one or it doesn’t mean that that cell’s not responding to other things too, which it almost certainly is. You can just do the math behind this.

The logic and the math behind this says you can’t have neurons that represent… Single neurons that represent, that’s the only neuron that thing responds to. So it turns out that maybe… So the general answer here is that it’s not really true that these cells always respond to Jennifer Aniston. They often do, but not always. And it’s also not true that they only respond to Jennifer Aniston. They would respond to other things too, even if the researchers haven’t figured out what that is. This is a virtual fact way neurons work. It can’t be otherwise. We could discuss that if you want. And then you can understand, we tend to over-represent things we’re very, very familiar with. So if I am a big fan… Is Jennifer Aniston on Friends? I’m not a TV person. So I-

Jim: Yeah, I never watched the show either, but she’s a pop culture person.

Jeff: Right, right. So let’s say, I think, is that right, friends? Okay. Whatever.

Jim: I think I saw the show once, but-

Jeff: Okay. Anyway, for some people, I bet you wouldn’t find a Jennifer Aniston cell in my head because I don’t recognize her. I don’t know what she looks like or I just wouldn’t know, and you’re not going to find those. But if I spent all my life really studying her and watching all of her performances and she was a big… Then I would’ve a whole bunch of neurons that respond to her in different situations, memories I have of different scenes and different so on. And so it’d be more easy to find those because I’ve dedicated more of my cortex to remembering things about that show and her, there’ll be more cells, many cells that respond to her in different situations. But if I didn’t do that, it would be very difficult. I’m sure I don’t have any Jennifer Aniston cells, but again, it’s not true that Jennifer Aniston is the only response to Jennifer Aniston. And it’s also true that it doesn’t respond to all instances of Jennifer Aniston. And it’s also true that it responds to things that not Jennifer Aniston. So it’s a bit of a myth.

Jim: Though you must have some Jennifer Aniston pattern, a hologram, because when-

Jeff: I know the name.

Jim: Well, I said, Jennifer Aniston, you knew that that was a celebrity.

Jeff: I’ve read those papers. My daughters’ know who Jennifer Aniston is. So I’ve heard the name. So clearly I have a representation of the name, but I don’t have a visual representation of what she looks like. So if you gave me a picture of Jennifer Aniston, you wouldn’t find any Jennifer Aniston cells.

Jim: Interesting. I think we’ve talked a fair bit about the neuro foundations and how your model differs, but is also similar and congruent with the evidence as it’s been coming in of past models on these kinds of things. Let’s now switch towards the AI side of things. Jeff talks in the book about a pretty strong line between particularly the today’s neuro deep learning styles of AI and what you guys are up to. Why don’t you draw that picture for us? Then we start talking about what you’re actually doing and what you imagine for next-Gen AI.

Viviane: So yeah, our approach is quite different from deep learning already with the principles that we mentioned before, in that it uses reference frames to learn structured models of the world. It uses a much more structured basic unit of computation, a learning module. It uses sensorimotor learning, which deep neural networks are just not made for because they assume IID data, which means that the data is independent and identically distributed, which is really good if you have a huge data set that you can just take and shuffle and then show to the network many, many times. But if you’re doing sensorimotor interaction, you just can’t really do that. You can’t just shuffle your experience of the world, but you’re constantly generating your experiences yourself. And whatever you learn right now will influence what you experience next and how you will act next and sample the world next. So you just need a different approach for that and a different way of representing knowledge. And that’s what we’re trying with this system and why we’re using explicit reference frames and this learning module modeled after critical columns.

Jeff: One of the things, the advantages of this method is that the large language models and kind of deep learning systems require huge, huge amounts of data, but brains don’t require that and neither does the 1,000 Brains project. We can learn something very quickly because it’s structured. I can give you a new object in the world or new idea, and you can learn it very rapidly with just a few sensory experiences. And you don’t forget anything else. It like, oh, I learned what this new tool is and I’m going to forget all my other tools. So it’s a very quick learning low power system because the way the data comes in is this sensory motor data. It’s like I’m moving through the world, sampling it in a certain particular way. Vivian mentioned the IDD. It’s not like that. And so there’s some real advantages to learning this way. 1,000 Brain-based AI is going to be low power and very quick learning, and you won’t require these huge data sets.

Jim: The IDD is really interesting and important. I hadn’t actually thought of it that way, but in reality, the universe is causal, right? You think of Pearl and his work. And in some level the brain must be modeling causality, not just coincidence or correlation or anything else.

Jeff: Well, so this gets down to our current work. I mean, just like last week I was working on this. I am familiar with Pearl’s work, but I’m not using that language. I don’t find the language that he uses in the mathematics useful because I want to think about the neuroscience. If we want to learn how to manipulate the world as humans, you want to accomplish something, Jim, with your podcast, we want… Vivian’s trying to get your audio system working today for this podcast and so on. We have to learn how to manipulate the world, how our actions might lead to desirable outcomes, and how do you express that? What’s the language they do with it? Pearl will use the language of mathematical language to describe those things, but I’d like to look at it in terms of like, okay, we know how neurons build models of the world.

Now how do those models change? How do things change when I interact with them? How does the stapler open and close and how do I represent that in neurons? And then how do I learn that certain actions I take lead to certain changes in behaviors of objects? We can express it at this level. It’s not really a mathematical formalism as much as more of a biological mechanism. And we understand this a lot. We’re getting very close to a very solid understanding of all this. Learning causality is learning to say I perform something and in subsequent time some behavior existed in the world, and how do I correlate those two? And if there’s a big gap in time between when I did something and when the event occurred, it’s hard to correlate them unless you already have a model of it. But how does the brain learn that stuff?

So although we don’t have complete answers to this stuff, this is what we’re working on right now and we have a lot of partial answers, and so we want to make a sense. Our AI system, we want them to be like humans. We want them to be in the sense that they can solve problems independently in the world. Imagine, I wrote about this in the book. It sounds fanciful, but I really believe it. What if we wanted to send some robots to Mars to build a habitat for humans, but they’re not going to be driven by deep learning networks. They’re not going to be going back and getting more data from earth. They’re going to have to be out there looking at things like a human would do and say, oh, I got these problems. I got this tool. How am I going to figure this out? Let’s do an experiment.

Let’s interact with these things and see how we can build things. That’s the kind of AI we want in the future. I think the only way to get there is the techniques we’re using, we’re developing where the sort of deep learning technologies are really great. If you have a lot of data and you know what the answer’s supposed to be and you want to just automate that.

Jim: We know that the brain doesn’t work like that at all. A bunch of things you said that I want to react to. One, you’re touching on. One of my other sort of favorite domains was Gibson’s ideas of affordances and how those may or may not emerge. I suspect you have a different model, but that was my first exposure to the concept of neuro-driven affordances. Second, this issue of learning on small data is huge. I bring this up all the time with these LLM Uberallis people that say that just make them bigger. We’ll get the AGI. I go, “I don’t think so.” And here’s why I give a example to them and it shuts them up every time.

Okay, let’s take your LLM or other statistical correlation based approach to representing intelligence and let’s hypothesize a 17 IQ 90 American, think of a male because why not. In his life, he’s had X amount of data about driving. He’s spent X hours sitting in a car driving, mostly pounding on his sister next to him in the back seat, or looking at his… I guess these days they’d all be staring at their phones hypnotized. And he also has the data from watching cars drive by. So take a data set of exactly that size, feed it to your alleged AGI, and then with no more than two weeks of real-time practice from only that data set, have it be able to drive in a way that you’re not at total menace to navigation. And by the time you have logged a thousand hours of driving, you’re okay.

And when I suggest that as a model for… Not a model or a probe on something like AGI, the LLM guys always shut up. What do you think about that? Well, first, let’s talk about Gibson and affordances and then maybe that little thought experiment.

Viviane: Yeah, I guess with your driving example, I would 100% agree that deep neural networks aren’t the solution for this since it’s fundamentally a sensorimotor task and it’s a task that humans can solve very easily. And we see this everywhere. I mean, I have a 1-year-old son and he learns so quickly, he just watches me do something once and then he imitates it and does it. Try to do that within neural network. It’s like I work with sensorimotor AI every day and it still amazes me what a little child can do. So I think, yeah, we just need to look closer to how the brain solves these problems, and I don’t see a path where deep neural networks could be for that.

Jeff: I was curious what you think affordances are, because I have an understanding in my head of what they are, but-

Jim: In my model, and I do have a little side project or trying to put explaining it all, human cognition or animal cognition. Affordances I call the attributes of objects for which something can be done with by an organism.

Jeff: Okay, all right, let me take you that one. Right? So talked about earlier about objects have behaviors. I’ve used the stapler, we use that a lot. You could think of your cell phone. If I touch my cell phone, different things happen. I have a model of this, right? I know what’s going to happen when I do these things. I know what’s going to happen if I lift the staple to the top. I know what’s going to happen if I push it down. I have a model of how an object and many objects in my life behave, even like a door. I know doors behave. I see a handle, I see a hand, I know what to do. I say, oh, if I pull on this or turn this or move this lever, it’s going to move in a certain way.

Jim: I would describe all those as affordances of the door. And I had also had a couple of metaclasses of affordance such as approach, withdraw, avoid, et cetera. And oddly, and I can think they’re all affordances, but maybe some people would not include them.

Jeff: I’m not going to go in that direction because I don’t think I can. But let me just tell what we think about affordances. So I may learn how a stapler works, and I’ve learned a stapler and I’ve seen that there’s two parts and there’s a hinge on it, and maybe there’s a little pad on top, which seems to be suitable for putting my hand on something like that. Now I look at a new object, which is different. It’s not a stapler, it’s something completely different. But I see something that a part of that object is similar in its structure to a part of the staple. So if I see a little pad on it that looks like the pad that was on the staple, I might say, oh, I know what that part is. That’s where I’m supposed to push on that. Or I might see something that looks like a hinge.

It’s like, oh, these two parts coming together with a, looks like a pin or something like that. I say, well, maybe there’s a hinge here. So even though I don’t know this object, I’ve never dealt with it before, I might say, oh, given locally there’s something that looks like a local part of this new object looks similar to a local part of another object, then I might assume they have the same behaviors and they may result in similar behaviors when I interact with them. So to me, that’s like I might see a light switch and it may be a different shape than all the other light switches they have, but it’s on a location where I would expect to see a light switch and maybe I don’t understand it, maybe it’s a touch switch. I’ve never seen one before, but that’s where light switches go.

So I might assume that it’s going to… This is a thing that should turn on a light even though I don’t see a switch. So to me, an importance is a part of a new object that is similar to another object I’ve already learned, and I transfer the behavior of the previously learned object to the new object as a hypothesis. It just naturally comes out of the neural representation. So that’s what we do. But if I present you something of a completely novel thing, I can’t imagine, some weird thing. It doesn’t look like anything you’ve ever seen before.

What you’ll do is you’ll look at the individual pieces of it trying to find something that’s familiar, something that might say, oh, I see. Maybe this is a door here, or maybe this is a hinge, or maybe this is a latch, or maybe this is something I can turn it on. But you don’t look at the thing on hold. You’ll look at each pieces and you’re trying to feel like there’s a correspondence between this piece and something else I’ve learned elsewhere in the world. So we can actually do this exactly like this. We can model this stuff in our work and we’re going to do that. So that robot on Mars I talked about earlier.

Jim: A good thought experiment, by the way, right?

Jeff: Right. So the robot on Mars says, I have a broken tool here. I need to fix it. I don’t have the right way of fixing it. I need a screwdriver, but I don’t have a screwdriver, so let me look for something that’s kind of like a screwdriver so it has the right attributes and maybe this nail clipper will work or something because it has the right attributes. That’s kind of affordance. So it’s about building models of the world, models of the objects in the world. Those models include behaviors, and then applying subsets of that model and behaviors to new objects and new situations where we haven’t dealt with before. We can understand that from a very deep level on the neuroscience, I mean, like neuron by neuron type of thing. Our models are like that, and our tool and the AI we’re building is going to do that stuff. It doesn’t do it yet, but it will.

Jim: Let me feed back what I took from what you just said, and you can tell me if I got it right, got it wrong or missed by three rings, which is that somehow our neural systems, presumably a series of columns, create not only specific known affordances for the objects that we use all the time. I know exactly what happens if I press their little red phone on my phone, it’ll hang up. But we also, there’s a generalization of affordances to things like hang up the phone or turn on the light, or probably even more general than turn on the light. Go from state A to state B and that we probably from mining our episodic memories, maybe?

Jeff: No, we can just… Remember, we talked earlier about you don’t perceive the whole world at once, right? So imagine I’m looking at a stapler. I don’t look at the whole staple at once. I attended different parts on the stapler. It seems like I’m looking at the whole stapler at once, but each column is only looking at one little part. I can do all of this with one column. Everything we’re talking about, it can be done with one column or maybe a couple of hierarchical columns. But the point is that when I’m looking at the stapler, if I’m looking at a new object and I don’t know what it is yet, I don’t know what this new object is, I don’t have a model for it.

I look at some portion of the new object and that portion of new object looks like a portion of something I’ve seen earlier. It says, oh, this is a hinge on a staple, or this is like a doorknob or something. And then what I can do for that local observation on the object, I can say, oh, well, then I know that there is a behavior associated with this new object based on this local observation. It doesn’t require episodic memory as much as it just basically says I have to have models of existing things. And then the new things, I try to find parts of the new things that match the parts of the existing things I know.

Jim: Well, let’s take your other thought experiment though. The orb, you come in and there’s a shiny egg-shaped orb on your desk, and you have no clue what that is, how it got there, or anything else. You did say that you would probably try some experiments based on something to cause something to happen, presumably. I know I would.

Jeff: Yeah. If you weren’t worried about damaging yourself or something. The first thing I would do is I’d pick it up.

Jim: See how much it weighs. That’d be the very first thing I’d do.

Jeff: Is the weight evenly distributed by me. For example, if the weight was not evenly distributed, I might say, oh, it’s, maybe it’s just a paperweight. Paperweight’s a heavy weight on the bottom, and it’s a decorative paperweight. But then I would look around it. I would look at all aspects, see if I could see anything. Maybe there’s a crack, maybe there’s a hole, maybe there’s a keyhole. I don’t know. The point is, I mean, the orb is kind of odd example because it’s almost like it’s featureless. When I was talking about a novel object, I was thinking something that has some features to it. Imagine I never saw a Rubik’s Cube.

Jim: Oh yeah, that’s a good example, for lots of people. There was a time none of us had seen a Rubik’s Cube.

Jeff: And someone just put this thing in front of me and what would I do? Well, first time I would notice that there’s cracks. It’s not just a cube. There’s these cracks and the cracks on in all different directions, which suggests that maybe these things are independent. And so if I see a crack, I might try to open it like a hinge, and that wouldn’t work. But I would try, I would say, oh, maybe this pop hinge is open. And I would try that. It looks like that’s like things that have hinges to have cracks like that. That’s an example I would say. You would see features a reminiscent of features you’ve seen on other objects, and you say, oh, well, maybe this thing is a hinge. This top piece of these cracks are because this part’s just separate and they can move up and down, which is not true. On a Rubik’s Cube, you’d probably discover that they rotate by accident. I mean, you might think like, oh, maybe it’s like a jar and unscrew the top, but it’s not rounds. I wouldn’t have assumed that either.

Jim: Or it might be static when you first see it, like some weird art object, right.

Jeff: It could be static. But then those little cracks, and I would look in the cracks and say, those cracks go really far in those. It’s not one piece of plastic. These are separate pieces of… I mean, these are the kind of things that you would do on a new object, and you’re just looking for things that are similar to other objects. And then they suggest the affordances of that object.

Jim: Okay. Let’s talk a little bit about where you guys actually are on implementing this stuff. I did take a quick cruise around your Monte project this morning and read a little bit of the documentation, though GitHub was acting bizarre this morning for some reason. So I didn’t get as far in as I would like. It wasn’t you. I tried something else on GitHub, and it was somebody is either doing a distributed denial service attack on them, or they’re just having systems problems. So tell us about the current state of play of taking this rich body of ideas and actually instantiating it as technology.

Jeff: Just before Viviane answers this question, I want to point out that Viviane is the director of this project, so she’s managing the whole team. This is her. So maybe…

Viviane: Yeah. Yeah. Literally, I’ve been working on this project for quite a while now, already. Actually, we started working on this at Numenta when the 1,000 Brains Project was still under Numenta. And we started about three years ago, decided that the theory is advanced enough that we have a pretty good idea of how we would implement this. So we wanted to see if it would actually work. And we started implementing the system called Monte, and we figured out it actually does work, and it works quite well, at least the parts that we have implemented so far. And then last November, we open-sourced the entire code base. It’s now the MIT license, so anyone can use it for any purpose. And also a bunch of documentation like you mentioned around it, a lot of YouTube videos explaining things and going through our thought processes behind it. But essentially, it implements these general-purpose learning modules.

And a second thing, which we call sensor modules, whose main job is to translate data from a specific sensor into the cortical messaging protocol so that they can interface with any learning module. And the system is… It’s definitely still a research implementation, and we are actively working on it a lot. And there are a lot of things on our roadmap, but it can already do some interesting things. For instance, mostly right now it’s focused on sensing. We’re not doing much of-

Viviane: … right now, it’s focused on sensing. We’re not doing much of actually manipulating the world. So for example, we can take a little camera or a little touch sensor and it can move over an object, and then recognize that object through moving over it. And similarly, you can take multiple patches of multiple cameras or multiple touch sensors and move them together over the object and then recognize it faster. And of course, before we recognize we need to learn the object. And that happens very fast, very efficiently. So we usually show each object once. If we have a touch sensor, we can move around the entire object. Or if we have a vision sensor, we show it 14 times from different angles. And that’s enough for the system to be able to move over the object and learn everything it needs to know about the object, to then recognize it from any new arbitrary viewpoint, sampling completely new points on that object under noise, all these adversarial conditions.

Jeff: I think as Viviane points out, we’re doing a lot of research still, but this isn’t a research project. We want to create a true AI code base. We have a team of, I don’t know, seven people now, something like that, that are, we’re trying to do a very professional job at this documentation, all the right protocols, the contributors, how people can contribute to the project, but we want to move this into more of a, I don’t know if I want to say commercial, but I would say a practical and useful implementation. And we do have components of it that we still haven’t implemented yet. And she mentioned, we’ve talked about some of them interacting with the world, manipulating objects. That’s why this is something we’re working on right now. We’re working through the issues associated with that. I don’t think this is a 20 year project. This is something like, “Okay, we’ve got the foundations in place. We need to extend it in various ways. And people should be able to get up and get working on it rapidly. Not research code, but implementations.”

Viviane: Yeah. Yeah, good point. I didn’t mean to devalue it by saying it’s a research project. I guess it’s more because we have such ambitious goals for this implementation, essentially that it can do anything the neocortex can do. So there are still some gaps like modeling object behaviors and interacting with the world. But if you look at other AI solutions, I mean, usually people are happy if the system can just do object-imposed recognition. And they don’t worry about whether it can also manipulate the world or read language or something. So yeah, it can definitely already do some cool and impressive things. And we’ve already tested it on some real world data as well. And they’re already interesting applications that the system could deal with.

Jeff: If you don’t mind, Jim, an analogy might be, if you think about Linux. Now, Linux was really one guy, Linus Torvalds, but it was a multi-year effort to create a technology base, which then had a huge impact. And what we’re building is more difficult in some sense, because it’s novel. No one’s ever done this before. People don’t even understand these concepts yet. But we’re building this technology base, that I think is going to just revolutionize the world in many, many ways. This is not a minor thing. And so we take it very seriously about the role that this could play in history.

Jim: This is potentially hugely exciting if you’re right. I remember when I read Mountcastle, I said, “If this sucker is right, maybe there is a master algorithm.” Because back in the old way, back in the good old-fashioned AI days, the people all thought, “Well, if I just get the types right, then all AI will immediately bootstrap.” Wrong. And then the deep learning guys think that transformers and related technologies maybe throw in a GANs here or there, “Ah, the thing will bootstrap itself to ASI by 2027, right?” Highly unlikely in my opinion. But if Mountcastle is right, the lift to get to the general learner, may be not that high. So that’s presumably your bet.

Jeff: Right, it is a bet. And from where we sit, our understanding of this stuff is pretty deep. It’s taken me years to get to the level of understanding I’m at now. So I feel pretty confident making these claims. I know it’s bold. Most people think it’s like, “Wow, you’re really putting yourself out there, Jeff.” And I think, “I don’t think so.” I think it’s pretty clear to me how this is going to play out. It reminds me of something in my past. You mentioned earlier, I worked in the handheld computing industry and started Palm and Handspring. There was a point early on in my career, where I realized that billions of people were going to have handheld computers. And this is a time that nobody had handheld computers. Most people didn’t even have cell phones. In fact, portable cell phones didn’t even exist practically. Some people had them in their cars.

Jim: It was big, right?

Jeff: Yeah, right. Well, they’re in your car these phones [inaudible 01:13:55].

Jim: In a bag. Remember the bag phone?

Jeff: Yeah, right. And so I started talking about, “In the future, people aren’t going to have phones. They’re going to have computers that also operate as a phone. But that’s not the major application. And these going to be in your pocket. And they’ll costs under a thousand dollars.” And I literally, would talk at conferences. I’d say, “Billions of people are going to own these things.” Now, this sounded so outlandish that my marketing VP, Ed Colligan said, “Jeff, stop saying that. You sound like an idiot.” And so I was thinking, “Oh my god, maybe I’m wrong. I don’t know.”

So at my age, I have to go with what I understand. And I understand as well, I think I understand the future of AI, as well as I understood the future of mobile computing, is that this is going to happen, these principles that we’re developing are the correct ones. I understand from your point of view, maybe I’m right, maybe I’m wrong. It doesn’t feel that way to me. I feel like I understand this deeply enough that it’s going to happen, whether we do it or not, it’s going to happen. And we’re in the best position to make it happen sooner.

Jim: That’s exciting and I’d love to see as this thing goes further along. Have you guys thought about some of the classic demonstrations of alleged general learners? For instance, being able to play Atari games, just from watching them, and you’d would be able to do that way quicker than deep learning, might be impressive. What are you guys thinking about, in terms of near term demonstrations that you’re approaching general learning?

Viviane: We spent a fair amount of time trying to think of some cool early demos. And since there are still a couple of things that we have on our roadmap that need to be implemented, like object behaviors, which are involved in Atari balls moving around, we’re currently looking more on active sensing applications. For example, if you have an ultrasound, and currently you need a very, very trained person to actually be able to move the ultrasound to the correct locations, be able to interpret what it’s seeing. And in remote locations, you might not have those experts to do that, but you have a mobile ultrasound. And if you would have software like Monty, it could tell any person how to move this probe and what it is actually sensing and what it is seeing. And whether there’s a problem. So that could be an early application that is useful probably a little bit further down the line, not too long. I think we’ll be able to do a lot of cool applications, actually interacting with the world and doing things and manipulating the world.

Jeff: I think what we’re doing is more akin to what Turing and Van Neumann did in the late thirties and early forties or mid-forties. They were defining the concepts of computing. What does it mean to compute? What does it mean to have algorithms and so on? And at that time, that was very theoretical work. There was almost nothing practical about it. But if you read those guys, you realize that they understood the importance of it.

Jim: Yeah, I’ve read quite a bit of Van Neumann in particular. And in fact, they even gave a talk in Budapest having all the nerve to do that, about the significance of Van Neumann, right?

Jeff: Right. So these guys knew that they were doing important things, at least that’s my impression.

Jim: Without a doubt.

Jeff: Okay. But what was the impressive demos? There wasn’t really many. Even in the movie, that recent movie about Turing, I forget, The Imitation Game, they were talking about how he was trying to convince them to use a computer to replace the human people doing computing. And they said, “Oh, the humans are better, so why do we need to run this machine?” Because he was excited about the concepts of it.

I think in some sense, what we’re doing, I feel like that. I feel like we’re building this foundation, and yes, people want to see demonstrations, they want to see cool demos and we’ll get there. But to me, I want to solve the hard problems. I don’t want to be standing on a soapbox selling stuff. I want to be basically solving the hard problems that are going to be important for the future. And so there’s a bit of a friction sometimes between me, who wants to work on these problems that don’t have really great demos, but let’s understand deeply what affordances are, let’s understand deeply what behaviors are, and the need to say, “But you have to have a great demo, or otherwise no one’s going to believe you.”

Jim: So I advise a project here or there. And there’s always that tension, that how much do you go deep? And how much do you add flash, right?

Jeff: I’ve never been a good flash person. Viviane’s much better at flash than I am. I mean, she knows how to make good presentations. She knows how to make good demos. And so maybe we’re a good pairing there, because I’ve always been more like, “I just want to solve these problems. I don’t care what other people think at this point in time.”

Jim: As you know, as a guy that’s built big companies, nothing great ever happens driven by one person. It’s always a team of people who bring different perspectives and different skills together. And at its best, it gels into a super organism.

Jeff: Right, right.

Jim: And it happened a couple of times in my businesses, one in particular, the only of the 17 ventures I’ve been involved with, either as a founder, a director or an investor, only one just did its business plan without any inflection. Was profitable the first month we shipped our product, never had a negative month.

Jeff: Wow.

Jim: And that was also the one where the team just gelled into this weird hyper being. And it was very interesting and amazing and almost impossible to replicate alas.

Jeff: Right, right, right. But it does take a team. I mentioned it earlier when our VP of marketing told me to stop talking about billions of people owning these handheld computers. At that time, no one owned them. And so I stopped, and we ended up our first product, the first successful handheld computer, the Palm Pilot, we called it an organizer. We didn’t even call a computer. I said, “Ed, can’t we call it a computer?” He goes, “No, no, no, that’s too much. No, no, just call it an organizer.” Well, it was very successful. And in some sense he was right. So I’m just bringing out the point that it does take multiple types of people to figure out how to introduce the new technologies to the world and to get people excited about it.

Jim: Well, I think this is very cool that you are… Because again, I did see that if Mountcastle was right back in 2013, that the lift to the general learner might be less than we thought, but I didn’t do anything about it. Shame on me. But you guys are doing something about it. And you started a lot longer ago than 2013. So I’ll look forward to monitoring your progress. I still write a bit of code from time to time and play with frameworks. So once GitHub is working good, I’m sure it’s working good now, I’m going to go down there and download the code and the examples and the tutorials, and play with them a little bit.

Jeff: That’s great.

Jim: And oh, by the way, as all listeners know, we will put a link to the repository on the episode page at jimruttshow.com for other people to go and play.

Jeff: Right. We welcome everybody. And there’s lots of ways to contribute, with people contributing from code to documentation and training. There’s all kinds of ways people contribute.

Jim: And I’ve even got some example problems. I have a game I wrote, that’s relatively easily learnable, and I even have an API specifically designed to see if AIs can learn it and things of that sort. But now, I have one last topic before we wrap up here. I don’t know if this was… It’s interesting, did your publisher tell you you had to do it? Or was it something you felt like you wanted to do? Which was the big swerve at the end of the book to social implications of AI, right?

Jeff: No, actually, I was told not to do that.

Jim: Okay, interesting.

Jeff: And I desperately wanted to do it.

Jim: All right, so in eight minutes, give us the Jeff Hawkins view of the future, where humanity and AI collide.

Jeff: Right. Well, so you’re asking me about my book A Thousand Brains, and the first third of the book or almost half the book is about the theory about the Thousand Brains Theory. And then there’s a whole section about AI and how it impacts AI. And then the last section of the book is about how to think about AI and intelligence in humanity. And these are topics that are very important to me. I spent a lot of time thinking about them. They’re not particularly popular, not many people do. And I literally wanted to put it in the book, because there was a logical falling off of the first parts of the book. But because I had some ideas that I thought were worth putting down, and I didn’t know if I’d ever have another chance to put them in and get an audience like that.

But I was actually encouraged not to include that stuff in the book. They said, “Well, it’s off topic.” I was like, “Yeah, I know, but this so important and I may not be able to do it again. I don’t know if I’ll have a chance to write another book, so let’s get it in there.” I really was thinking, honestly, both of my books, I wrote them on the idea that, what would a reader think a hundred years from now? And so I tried to avoid many popular cultural references or anything like that. I just tried to think of concepts that are really big concepts.

Anyway, so I think about humans, the only thing that makes us unique is our intelligence. We’re pretty average animals in every other way. And of course, we’re not the only intelligent animal, but we certainly are dominating this planet, because we’ve broken free. Our intelligence is now dominating everything. And so I worry about the future of humanity, because we’re part animals. We’ve got biological needs. We fight with each other, we scrap for food and we do all… Look for sex and all this stuff. And yet some of us like to live on this more ethereal plateau of thought and concepts and the meaning of life and science about the universe and so on.

And so I feel there’s this tension between the intelligence part of us and our biological past. And so I mentioned I wanted to read that book of your previous person who was on your show. And so how to think about that? And I realize we will build truly intelligent machines that work on the principles of the brain. They won’t be the principles of deep learning. They will be the principles that we’re developing in the Thousand Brains Project. And those machines will be super capable. They’ll be super fast, super strong, they’ll be… And they will interact with the world. They’re not just sitting in some server someplace. They’ll be able to interact with the world. They’ve talked a little bit like robots on Mars, building stuff, that’s actually going to happen.

So what is the future of humanity? What is our future? Is our future to preserve biology? Or is our future to preserve intelligence and knowledge? And it doesn’t have to be either/or, you can have both, but they’re in conflict with one another. And so to me, I wrote several chapters about, given our situation on Earth is precarious in the short term, but definitely in the long term, Earth is going to be uninhabitable sometime in the future. How is it we preserve the part of humanity that I care a lot about, which is not the genetic makeup of our bodies, but our knowledge about the world and our ability to understand the world? And so I think part of what really motivates me for what we’re doing at the Thousand Brains Project, even though it’s very far out in the future, is laying the basis for which the intelligence and knowledge part of our existence, our beings can survive and continue on without our biological basis.

And because our biology is probably stuck here on earth. And it’s stuck in the dirtiness and messiness of being animals. And who knows? I’m not saying we’re going to destroy our planet, but we could. We made a good start. So I had a chapter called, what was the title? Escaping Darwin’s Orbit, I think it was the term of it. And what I meant by that, we’re stuck in this biological realm that we have to live in. How do we escape from that? How does our intellectual part, our intelligence and our knowledge escape from that and survive longer term? So these are really far out things. Most people say, “What the hell are you writing that about? It’s nonsense.” I don’t think it’s nonsense at all. I think it’s essential to what it means to be human. And so I just had to get that in the book, even though I’ve had some people come to me and it’s like, “Yeah, I loved your book, but that third part what a stupid thing.” I’m like, “No, that’s the best part of the book. What are you talking about?”

Jim: Now, for me and the audience of our show, I have guests on here talking about AI risk all the time. We’ve had shows about nothing but AI risk. So this is hardly something we never think about.

Jeff: Right. So I wrote about AI risk, but then I also wrote about the risk of being a human.

Jim: Yeah, exactly.

Jeff: What are we here for? What’s the point of all this?

Jim: One thing, which at least to me was a relatively new concept, which was that the dangerous part of these thinking things, is the old brain, right?

Jeff: Right.

Jim: Let’s think about this guy who sits at 1600, Pennsylvania Avenue and seems to be all lizard brain, right?

Jeff: Right.

Jim: You are actually quite eloquent on perhaps, one of the ways to think about how we move forward with AI, is to be relatively careful that we don’t embed old brain into our artificial new brains.

Jeff: Well, some people think that that’s going to happen automatically. If you recreate the neocortex, it’ll be like a human. And it’ll have these desires and run away and do stupid stuff, like the person you just mentioned. And I try to make a point in the book that that’s not true. The neocortex itself is a model of the world. On its own, it sits on top of the other parts of our brain, but on its own, it has no desires, it has no goals. I mean, you can give it a goal and then figure out how to achieve that goal. But on its own, I made the analogy, it’s more like a map of the world, and the map of the world can be used for good purposes and bad purposes, but the map itself doesn’t have purposes. So what we’re building is that modeling system. And then it has to be embedded in something else, which we have somewhat independent of the technology we’re building. Someone has to assign it to tasks and goals that are separate. We have to be careful of that part.

So I don’t see the risk in what we’re creating. There’s very real risk in that, just like there’s very low risk in creating a map. But a map could be useful by really bad people to come and maraud and steal or burn your town down or something. But on it’s own it’s not dangerous. And so I think the risks really come from how we apply AI. Not that AI itself will develop these sentient needs to dominate or be the alpha male or female or whatever. That’s external, that’s added to the systems we’re building. And people don’t understand that. So many people think that once you build an intelligent system, it will automatically just, bingo, want to do things. I don’t know if you read Bostrom’s book, Superintelligence but he went off the deep end on this stuff.

Jim: Well, there’s a huge error there, and this is something in my own work, and I talk about a lot. There’s a confusion between intelligence and consciousness. There are things that are conscious that aren’t very intelligent. There are things that are intelligent that have no consciousness at all. AlphaGo has no consciousness at all, probably, unless Tononi and friends are right, and then it’s a weird kind of consciousness, but it’s highly intelligent. And I can name some people I know that are fully conscious but aren’t very bright. And so people confuse those two. Unless it’s engineered in, there’s no reason to expect volition or anything even analogous of consciousness to arise, unless you design it in, right?

Jeff: Right. So in the human brain, there are parts of the brain that are not the neocortex, which lead to emotions and aggression. And the amygdala is one of those things that’s associated with these things, the fight or flee type of reaction. So we’re not emulating any of that stuff and we don’t want to. And I think it’s a responsibility for us and everyone who uses these technologies to apply it correctly or properly. But I just really want people to realize that there’s no risk in itself in what we’re doing. These things aren’t going to get out of control, run away, take over the world. This is nonsense. And your distinction between consciousness and intelligence is a good one. I think of more all the processes of the old brain that are not really, we would say model driven by the models in the cortex. They’re more model-free type of behaviors.

Jim: Alrighty. It has really been a good conversation. I enjoyed the heck out of it. Hope you guys did too.

Jeff: Yeah, it has been great.

Viviane: Yeah.

Jim: I look forward to continuing to monitor your work. Maybe I’ll have you back on when the time seems right.

Jeff: Right.