Transcript of Episode 87 – Joscha Bach on Theories of Consciousness

The following is a rough transcript which has not been revised by The Jim Rutt Show or by Joscha Bach. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is Joscha Bach. He’s vice president of research at the AI Foundation. Previously he’s been a research scientist at MIT and Harvard. He’s the author of the book, Principles of Synthetic Intelligence, PSI. Interesting kind of joke at multiple levels. An Architecture of Motivated Cognition.

Jim: And he’s published many papers. I should also point out he has one of the most interesting, and entertaining Twitter feeds I know of. @Plinz P-L-I-N-Z. Check it out.

Jim: This is the second time we’ve had Joscha on the show. Last time we talked quite a lot about AI, and we talked a little bit about his MicroPsi.

Jim: Today we’re going to kind of split to the very high side, and then get down into some details on MicroPsi. So I think we’re going to start off talking about the biggest of questions, or one of the biggest of questions in this field is, how does mind arise from brain? And can it arise from an artificial brain? Joscha.

Joscha: I think it can obviously. And so the question is of course, what is mind? And there sometimes we need to clarify our terms. Not so much to usurp them, but to make sure that we are talking about the same thing.

Joscha: So typically when I say mind, I refer to essentially the software that runs on the brain, and that enables everything else. And what is software?

Joscha: I think that software is best understood not as a physical thing, as something that has an identity, but as a physical law. It’s a very specific physical law that says, whenever you put things together in the universe in this particular arrangement, the following thing will happen. And the following thing is a description in terms of certain macro states that have a causal structure.

Joscha: And in a sense you could say that for instance that the text processor in your computer doesn’t have an identity. It’s a physical law. And a physical law says that, when the gates in your computer are in this, and this arrangement, they can be described as a text processor. And when you interact with it, the following thing will happen when it’s currently in this, and this state.

Joscha: So the same category of things is mind. It doesn’t have an identity. It’s a principle. It’s a software that runs on your brain. And it doesn’t run on your brain because it is corresponding one to one to a configuration of atoms in your neurons, but because there is a coherence causal structure emerging over the activity of the neurons that you can use to describe it. It’s a lens to look at the activity of many neurons.

Joscha: And there is not a reason why this should not happen in other substrates that are under the same functional constraints, and can implement the same principles.

Joscha: In the same was as we can implement software on many, many different types of computer, as long as brains are Turing complete, you can build minds on different substrates, if you are implementing the same causal principles.

Jim: Yeah. Of course people would argue that human mind, the human consciousness, and we’ll make a distinction later about consciousness and mind, but let’s for the moment keep them together, is strongly embedded in its bodily substrate. And that, in fact some people like Antonio Damasio would argue that the actual root of our conscious being is deep in the brain stem, and is more related to our body actually than it is to our higher brain.

Jim: And if that’s the case, then while we could create artifacts that were analogous to the mind in some sense, they essentially can’t be the same if they’re not embodied in the same kind of structure that the human mind is embedded in.

Joscha: If I change the interface that you have to the universe, so you are situated different way for instance you are no longer located in your body, but you are connected to sensors all over a city.

Joscha: And your incentives are aligned with the incentives of the city, and you also identify with these goals, and so on, you might turn into a city at some level, right?

Joscha: And this still means that you have a mind. Even if I change your affordances. In the same way, if I limit your affordances to things that are internal to your mind, as it happens for instance during dreams at night. It’s not that you lose your consciousness in that moment.

Joscha: And I think that you can also be perfectly conscious while playing Minecraft. And even if you are embedded in a VR, that only gives you access to Minecraft, that probably has enough complexity to, without loss of generality, be sufficient to be conscious, and to form a bond into external world.

Joscha: We can take Minecraft and implement this on a chip. We can implement this on neural tissue in principle at least, and implement this within the brain itself.

Joscha: So you never need to leave your brain, and go into your physical body, in order to have the experience of a body. Your embodiment can be entirely virtual.

Joscha: What you need to have is an implementation. The implementation needs to have a realization on the physical substrate layer of the universe. But it doesn’t mean that you would need to have a physical body that is exactly human-like. You only need to have a human similar affordances, if you want to end up with a human-like mind.

Jim: Yeah. The old brain in the bottle argument, right? But particularly your own work has put a lot of emphasis on emotional states, and emotional valences, et cetera.

Jim: And there’s a substantial school of thought that says that, the reality is that the physical reaction of emotion, such as heartbeat, and respiration, and skin tone et cetera, actually happen before the feelings, i.e. the cognitive states associated with emotion.

Jim: So you’d have to also essentially have inputs that directly, at least for humans if you want to try to extend them to a non-body state, brain in a bottle. You’d also have to essentially fake out the physiological, emotional responses, wouldn’t you?

Joscha: We probably notice that consciousness is not in charge, when we control our interactions with the environment. Some meditators say that consciousness is like this little monkey sitting on top of an elephant.

Joscha: And it can direct the attention of the elephant by prodding it. But eventually at the end of the day the elephant is still going to do what it wants.

Joscha: And if the monkey thinks it’s in charge, then the monkey is often in for big disappointments when it notices that it makes decisions, but the elephant is not reacting to them.

Joscha: And this elephant part of our mind seems to be much older, especially with respect to the high level concepts that the monkey is developing to analyze its own behavior.

Joscha: The attentional control system is old, but the analytical systems that the monkey employs, as a result of its analytical, and attentional reasoning, and so on, they are rather young.

Joscha: And I think that feelings are a way in which the emotions, and motivational impulses of the elephant are accessible to the analytic monkey.

Joscha: And so in this sense, you could argue that there is a lot of stuff that precedes the feeling, because they are a way of communicating between subsystems in the brain.

Joscha: It’s the perceptual system that does the majority of operations in something like a distributed neural network, with nearly continuous dynamics.

Joscha: And then this analytical, [inaudible 00:07:23] engine, how do you interface these two systems? And you basically need to take the features that are being computed in this distributed perceptual architecture, and translate them to the localized discrete analytical model.

Joscha: And you do this by projecting them into a space. And the only space that is consistently available is the body map. So feelings are projected into the body map to disambiguate them.

Joscha: You could also take the perceptual space just outside of the body map, but this changes all the time, and we would not self assign it. We could also have a specific emotion space that is distinct from the body map.

Joscha: Why don’t we have feelings outside of the body? It’s a weird thing. And I suspect it’s because feelings have been implemented as an afterthought in evolution, so they had to be mapped into an existing brain region.

Joscha: So when you feel love in your heart, anxiety in your solar plexus, it’s not because your gut is involved in these computations. Your gut is completely occupied with herding bacteria.

Joscha: I think that’s simply a projection that happens. And if you are a paraplegic, and you don’t have access to sensations in your body, you still feel emotions in your body.

Jim: Yeah. The linkage between emotions and feelings is interesting. And at one level you could say they go both ways. Again there’s been some very interesting psychology lab experiments, where for instance I think one of the more famous ones is a person who walks across a dangerous bridge, versus a non-dangerous bridge. And then meets an attractive member of the opposite sex, who asks them some questions for a questionnaire.

Jim: And the probability of a male to ask for the questionnaire person’s phone number to ask them for a date is way higher if they go across the risky bridge than if they go across the not risky bridge.

Jim: So in sense, the bodily sense of fear, or challenge, operating unconsciously, well below the conscious mind through the body itself, seems to program the mind itself into a different state.

Joscha: Yeah. It’s interesting to speculate why that is. If it’s simply that an achievement to deal with a subjectively life threatening situation leads to a higher sense of competence, as the self theory would predict. And this basically would lead to greater risk taking behavior in the case of that agent. But to see if that is actually the reason, we would need to look into this in much, much more detail.

Joscha: I’ve read about these experiments myself, and I haven’t really made up my mind what the actual cause is, and which systems cross talk, and so on.

Joscha: Why is it that we fall more easily in love after we have been in a life threatening situation? I don’t really know. Even though I can easily come up with a few hypotheses, which is the dangerous thing.

Jim: Yeah exactly. Well, let’s do it. Come up with one hypothesis. Love to hear the Joscha Bach take on that famous result.

Joscha: Yeah. So the easiest hypothesis would be, if you had basically master a risk, then your willingness to deal with future risks increases momentarily. Because you have a higher self ascription of competence.

Joscha: It’s also the case that you put lesser value on the perception of risks that would normally fill your attentions. Basically you renormalize risk. If you are for instance find yourself with cancer in a hospital, and you are threatened with the possibility of death, then all the things that normally would give you existential fear, like losing your job, or losing the friendship of a person you don’t particularly like, and so on, suddenly are no longer important.

Joscha: And as a result you might start ending relationships that you’re no longer interested, or asking somebody out that you would never have dared asking out. Because the risk is just not the high compared to the actual existential risk that you have just been facing. So it’s basically a rescaling of your perception of what’s important, and what the valence of the important things is.

Jim: That makes a lot of sense actually. And then again, I know in your architecture, and in other architectures, those states decay relatively rapidly. And that’s why it may manifest in that particular experiment, where more or less immediately after having a challenging experience, the test is given. If you came back two days later, might not be the case.

Joscha: Yeah. So in some sense the prediction would be that for most people there is a range of emotions, that adapts to these situation that the agent is in.

Joscha: So you have roughly the same emotions, at roughly the same proportions as long as you don’t suffer physical pain, or actual threats day to day, that put you under real stress.

Joscha: As long as you are existentially safe, you feel roughly the same things at most of the times. And your emotions will adapt to the range of events that happen to you. Right?

Joscha: So it’s a normalization that happens in this way. And if you introduce new events into this stream of events, or if you change the stream of events, then for instance the amount of anxiety that you experience about things, might still be the same, even though the events are quite different.

Joscha: And which means you assign the different valance to these same events, if they are in conjunctions with others. There’s probably an interesting lesson there for organizing your own life.

Joscha: So for instance, many people suggest that you always make sure that in all ages you introduce a lot of novelty into your own life. So you always remain young, you always store new experiences, and information, and put value on them. And that you undergo frequent changes of your environment.

Jim: Yeah. I think I’ve been pretty good at that in my life. I’m always getting into some shit or other, right? I never stand still, right?

Joscha: Congratulations.

Jim: Let’s see, what I want to talk about next? Get your thoughts on some of the other theories of mind. I’m sure you have them. One theory that was prominent, I don’t know, 20 years ago is that mind was essentially a brain-wide set of interlocking frequencies. Some of them phase locked, and some of them rhythm locked. Any thoughts on that theory?

Joscha: So when I was confronted with Singer’s theories about that, that conscious is something like a frequency in the brain, that was also adopted by Christof Koch, I didn’t understand it.

Joscha: And mostly because it didn’t make functional sense. It didn’t explain how this is necessary and sufficient for producing the phenomenon.

Joscha: I thought that oscillations in the brain are necessary, because neurons are not able to be in a constant state, and make that constant state to each other. They need to fire.

Joscha: If they are connected in some causal structure, they need to fire in synchrony, right? There’ll have to be a wave of activation that passes through the brain. This wave will have to be periodic.

Joscha: So you will look at cyclic activations in regular frequencies. And these regular frequencies that if you can pick them up on EEG, it means that large parts of your cortex are firing in sync, so they don’t disturb each other, and you can see a common signal.

Joscha: But this signal would be the result of the synchronization that leads to consciousness. It’s not that the synchronization itself causes consciousness.

Joscha: And there is on another hand, some interesting insight from the perspective of a neuroscientist, who thinks about, how is it possible that neurons solve the binding problem? How is it possible that they talk to each other at all?

Joscha: And then you can think of neurons as something like for instance like a neural lattice, that is acting like an ether. And the signals that are being broadcast through the brain, are moving like electromagnetic signals would move through an ether. Just as a model that you can use as a crutch to imagine what’s going on.

Joscha: And if that happens, the question is, and if the signals propagate with this neural lattice, how do the individual neurons tune into the different programs that are being played out in the brain in parallel?

Joscha: And it’s tempting to think that there is something like RM, or FM encoding. Basically some encoding where the neurons know at which frequency to tune in, to become part of a certain computation.

Joscha: And maybe this is a viable model. So it’s a viable model for how the brain solves the problem of signal transmission, and synchrony of different processes over large distance in the neocortex.

Joscha: It’s probably one of the mechanisms that plays a role. But it’s not necessarily the best metaphor, or the best engineering principle for an architecture that we would implement in a digital computer.

Joscha: Because in a digital computer we can have random access, that we can relate everything regardless of spacial proximity in the computer. Because we can have a bus with a very high throughput of data. And the individual memory cells in the computer can hold arbitrary values without any need for oscillation.

Jim: Yeah. And I always think that’s an important point, when you’re trying to map from the biological to the machine, mother nature solves problems in a very odd way. You know, a teeny step at a time through evolution over billions of years.

Jim: And it’s amazing the stuff works at all. And if we’re going to design one from an engineering perspective, in many cases we would make very different decisions, that we might get some functionally similar results.

Jim: Back to brain-wide phenomena. One of my favorite cognitive models, and he very honestly says, “I have no idea how the mechanism works,” is Bernard Baars’ global workspace theory.

Jim: Where he believes that the representation of the sense of conscious for instance, the sensorium that we live in, the movie that we live inside in, is somehow broadcast to wide areas of the brain, and the various functional areas of the brain are then able to process that information. What are your thoughts about Baars, and his global workspace theory, even though he himself admits he has no neurological argument to support it?

Joscha: I think it’s partially born from intuitions, that are the result of introspection. And I think that introspection is not given enough credit as an important tool in neuroscience, and philosophy of mind, and psychology. There is no real methodology for introspection I think.

Joscha: It would be great if he could develop that further. And it does exist in the different meditation schools of course, as an important tool.

Joscha: I think that we don’t distribute conscious across the brain into the different areas. But it needs to be the other way around. The core feature of conscious is that we remember what we paid attention to.

Joscha: If we don’t remember that we paid attention to it, then we are not conscious of it in hindsight, right? So it needs to be integrated into a common protocol, where it can be accessed later on.

Joscha: And this integration means that it’s somehow local. So it’s the localization of information that before existed in a distributed way in the brain.

Joscha: So it’s a protocol that pulls from different regions of the brain. And it’s interesting to contrast this, and to compare this with the notion of attention in machine learning that is currently gaining in prominence.

Jim: Yeah that is interesting. And in my own work that I do in the area of scientific study of conscious, attention has always been absolutely key in my own model.

Jim: And as you say, it’s quite interesting that in the newest neural net architectures they’re using something they call attention. It isn’t quite the same, but it’s at least analogous.

Joscha: There is something that is very close to the attention that we have in our own mental attentional learning. The difficulty is, how can you train a system that is not organized into neat layers?

Joscha: Our brain is not a system that’s organized into neat layers, so we could train it as a whole using backpropagation with stochastic gradient descent.

Joscha: And even if we could, it would not be the right way to do it, because it’s not very efficient. Right? It takes an enormous amount of time to train a system. Many, many operations to get it to do the same thing that a human being can do.

Joscha: So arguably a human being is much, much quicker in converging on a model of interpretation of visual stimuli than a neural network is.

Jim: Yeah. In fact I’ve used that as a very important probe in my own work. Which is, the way the human brain does it must be rather different than gradient descent, or other similar methods that are used for learning in neural nets.

Jim: Because, I may have mentioned this in the last show. One of the things that really got me interested in thinking about this problem of learning mechanisms. It’s when I was playing a very advanced, and difficult war game.

Jim: I don’t know, this was about 2013, or ’14. And I realized that I was able to learn the game well enough to beat the pretty good AI really quickly. Like after playing seven games.

Jim: And I realized I was using transfer learning from the other times I’d played war games of different sorts. And I started adding it, how many times had I actually played relatively sophisticated war games to a conclusion of a game in my life?

Jim: And the highest number I could come up with was 5,000. And probably the real number was something close to 2,500. So at a remarkably tiny number of examples of analogous games that I could presumably mine for transfer learning, and then apply to this new game, and then learn to play it at a level good enough to beat the AI after seven games.

Jim: And we know that today’s artificial neural net architectures, not even close. Even the new good stuff, like Alpha Zero still take hundreds of thousands, to millions of plays, to learn even a very stereotype game, chess, or go, or something like that.

Jim: And these games are vastly, vastly higher dimensional space than those games. So high you can’t even contemplate how high they are. I figured at least 10 to the 60th move possibilities per turn, just to give a rough sense of it.

Jim: So yeah, I think that the next frontier in thinking about learning, and cognitive systems, and what we can learn from humans, is that the rate at which we learn as animals, and not just humans of course. Animals learn very rapidly as well. Have to be somehow fundamentally different than what we’re seeing so far from the world of artificial neural nets.

Joscha: We can learn things super fast, if we can pay attention, right? The difficulty is, how can you learn how to pay attention in the right way?

Joscha: There is a famous video on YouTube where a coach trains somebody who doesn’t know how to play tennis, to play relatively decent tennis within the space of half an hour.

Joscha: And this is because he’s able to direct the attention of the subject, with very great acuity, and tells her exactly what to pay attention to. So she quickly updates her behavior in the parts that count.

Joscha: And if you don’t know what to pay attention to, you have to in some sense brute force the problem. By applying the error function very broadly in your system, and hoping that the errors accumulate eventually in those parts of the architecture that make the biggest difference. But it’s a very wasteful way of doing it.

Joscha: When I was in New Zealand in the ’90s, I worked at Ian Witten’s laboratory. It’s the same one in which Ben Goertzel studied for a time, and Shane Legg of Deep Mind. Which is a happy coincidence. We never met there.

Joscha: But Ian Witten I think has a great, and underestimated influence on modern artificial intelligence. He’s the father of arithmetic coding. And he takes the data compression perspective on cognition.

Joscha: And while he never prominently took a stance in terms of cognitive science, his main motivation was to understand how minds work when he studied these issues.

Joscha: And so, when I was working for him, he put me in the lab, and asked me to find structure in language automatically. So what I did was first look at n-grams statistics. And I realized that adjacent words in English language, there are so many different words in the English language, that you cannot really do good statistics over for instance trigrams, or quadrigrams, or pentagrams. Mostly are restricted to bigrams, to pairs of words.

Joscha: And this means that for instance you’d lose the predictive power that an article has for a noun if there is an adjective between the article and a noun.

Joscha: And if you have multiple adjectives between the article and the noun, the earlier adjectives are not going to predict the noun anymore. So you forget, you’d throw away most of the structure. And our own minds don’t work like this. They do identify the structure over the entire sentence.

Joscha: So how can I discover the structure? How do I do the statistics? And I started using ordered pairs of words that didn’t need to be adjacent, and try to go beyond that.

Joscha: And it was very difficult, because you have such large statistics that you need to do. And in a way you need to do curriculum learning, like our own learning does so.

Joscha: We give children very short, and simple sentences that are short, and don’t require a lot of pointers to maintain. So the child knows what to pay attention to. And then we go to the next one. And the child dismisses sentences that are too complicated to understand for the child.

Joscha: And if you try to do this in your learning system, you typically don’t have these nice datasets, where a human being interacts with the system and teaches it language, as we teach our children, and make sure that the problem is learnable.

Joscha: So instead we need to have a mixture, where the system is trying to have candidates for meaning from complicated sentences, and then builds up from there.

Joscha: And back then the memory that I had in this computer, and the cleverness of the algorithm that I had available were very, very limited. I was able to discover grammar eventually. But I was not able to discover style in the way GPT-3 can do this.

Joscha: And when I looked at the transformer algorithm that underlies GPT-3, it was fascinating to see to me that they basically found a solution to the same problems that I had to deal with in the ’90s, and couldn’t solve by myself.

Joscha: And this is to basically make statistics over all the parts in working memory, and find out which ones of those are related to each other.

Joscha: And it’s still too simplistic, because it’s a fixed working memory window. It’s not able to source things out of its working memory, and put things in it. Instead it has 2048 adjacent tokens in the working memory. No more, not less, in this implementation that they use in GPT-3.

Joscha: And so for instance you can use this algorithm to interpret images, but only very, very small images. So those that don’t have more pixel elements than those that fit into this fixed working memory.

Joscha: And so this algorithm in its current form, is not able to comprehend large images, or video. It’s not able to relate the early parts of a book, to a late part in a book.

Joscha: It’s only able to keep two pages in memory at a time, and relate concepts within two pages to each other, using massive computational power.

Jim: Yes, I played a little bit with GPT-3. And it’s both remarkable, and frustrating, right? You go, “Wow.” What it can produce in terms of emulation of styles, and plausible completions, et cetera, for as you say, it’s within a small domain. Frankly I see it not working well, if you try to get it to generate 2,000 characters of text, you’re pretty much out into garbage land pretty quickly.

Jim: But in a shorter frame, two or three sentence, it’s remarkable. And yet, it really doesn’t have any language understanding. It’s just, it seems like it’s just the biggest so far, collection of associative statistics.

Joscha: I think we have to give it a little bit more credit. So I wouldn’t say that it’s a real shot at AGI, at least not in its present form. But it is able to understand certain things, or at least consistently act as if it understands it.

Joscha: Which means I can give it a task. The way that I give it the task is weird, right? It’s all autocomplete. So I give it the beginning of a text about the completion of the task.

Joscha: And then it’s able to continue that text. So for instance I could say, “The following is an extraction of sentiments from paragraphs.” Then I give it two examples.

Joscha: And now I give it new paragraphs, and it’s going to extract the sentiment from these paragraphs. And if that is simple enough, the system is quite reliable.

Joscha: So I can ask it for instance for one of six key emotional categories in a part of speech. And whenever I prime the request to it with telling it that it’s currently continuing a text that is extracting these sentiments from paragraphs, it’s going to do that just fine.

Joscha: And this in some sense is semantic operation. It does understand at some level, what it means to extract a sentiment from a paragraph. It’s only not related to a global model of the universe, a global sense of meaning that we have. Right?

Joscha: When you and me understand something, it means we have found its place in the universe of that thing. Of course the universe that we are talking about here is not the physical universe, which you and me probably don’t really understand.

Joscha: Physics doesn’t really understand it yet. Or at least those physicists which don’t understand it yet, have not found a majority that lets everybody of us understand it in the same way.

Joscha: And so what we mean by the understanding of the universe, or the universe itself is our model of the universe that we have in our own mind.

Joscha: And this model of the universe means, okay there is this roughly three dimensional world, that has gravity in it, and four forces, and most of the interesting juicy stuff is electromagnetic, including light, and sound.

Joscha: And it’s all organized in aggregates of matter, that we can interact with in a certain way, and it implements us, and we are contained in it. And it’s implementing the following set of mechanisms more or less, right?

Joscha: And in this big interconnected model that connects everything with everything about the universe that we are part of, we find a relationship, a causal structure, that explains how that thing comes into being in a certain way.

Joscha: And GPT-3 is not doing that. It does not have an idea about the universe that it’s part of. Instead it’s just babbling. But it can create plausible universes to some degree. It’s not fully coherent. At some point the coherence falls apart, and much faster than it does in our own minds. Most people are not fully coherent, right?

Joscha: I think maybe no human being is fully coherent. But the incoherence of GPT-3 is much, much more blatant than ours. Maybe we can improve this by putting a coherence loss on its learning, instead of just a [inaudible 00:30:12] loss.

Jim: Maybe. But of course, what I think you’re talking about here is the famous symbol grounding problem. GPT-3 doesn’t have its symbols grounded at all.

Joscha: No, it has them grounded in language.

Jim: Yeah, I was going to say, they’re self referential back into language. It’s not grounded into-

Joscha: Yes. It’s only language. But it does have a proper grounding in language, for many of the operations that it performs. So when it’s able to do two digit arithmetic quite reliably, there is in some sense a grounding by treating the symbols that it manipulates as symbols that are subject to arithmetic operators, that are properly evaluated, right?

Joscha: So as a functionalist mathematician you should be satisfied. It’s not mapping this onto the same understanding of axiomatics as proper mathematicians do it. But arguably not all mathematicians do have the same understanding of mathematic axiomatics, and this doesn’t make them completely impotent as mathematicians.

Jim: Yeah. Though of course we know that GPT-3, while it does fine on two digits, starts to fail at three.

Joscha: Yeah. Then again it’s trying to predict likely texts where there are not that many human texts that have good arithmetic over long numbers, because we don’t write texts about this. This is not a human activity. And so, it would be inhuman to do that.

Jim: Yeah. It does not understand for instance the idea of the algorithm to do the arithmetic, right? One time as I think it was just for fun, I wrote the algorithm to do multiplication manipulating text strings, right? Just because I wanted to make it clear in my head how to do it. It took me like a day to write that, it wasn’t that hard. But GPT-3 doesn’t have anything like that. It has no ability to generate anything like that.

Joscha: If you ask a machine learning system to find a decently short state machine that explains all the arithmetic that we throw at it, then the system is quite likely to discover the operations that we want to discover it quite quickly.

Joscha: There are deep networks which are able to do better integration, and differentiation than Mathematica, and able to solve things in a shorter amount of time, or things that Mathematica, a symbolic system that is hand crafted using proven mathematics cannot do. Right?

Joscha: So it’s able to learn things that mathematicians can do, but mathematicians don’t yet know how to do in the right way. And it can learn a few things that mathematicians maybe cannot do yet.

Joscha: And GPT-3 is just not trained for doing this. And the implementation might be that Karl Friston is wrong. That the free energy minimization which comes down to a surprise minimization translated into the terms of physics, is not the right principle to build a mind.

Joscha: It could be that you could still say that you can minimization surprise a little better if you are able to assign the right value to your metalearning algorithms. But you cannot make those proofs if you are a simple algorithm.

Joscha: And I think that’s the reason why simple organisms are not using a single principle to organize themselves, but have multiple needs, and these act as reflexes that satisfy many, many dimensions of needs of the system in the environment. And they are evolved. And we can probably supersede most of them. They are reflexes.

Jim: And though interesting, and I think this is important when thinking about how to get the AGI… It’s again one of my own little design principles. Is that when you look at things from a symbolic perspective at least, attempts to solve these high dimensional problems that even a very relatively simple organism has to solve, inevitably run into the combinatoric explosion of options problem.

Jim: One of the things about these simple minded algorithms, is they essentially collapse all that. And I’m relatively convinced that our attention algorithm is essentially a hack to get around the combinatoric explosion of options problem.

Jim: Let’s move onto another topic. We’re talking about ways of thinking about how mind emerges from matter. Another one that’s become popular over the last eight or ten years, is Tononi’s integrated information theory, and Christof Koch has also been a supporter of that.

Jim: You know, it’s a head scratcher. It’s kind of interesting at one level, but it’s relatively easy to create high II scores that are clearly not a mind. What do you think about the integrated information theory with respect to mind emerging from matter?

Joscha: I think if you go to a workshop of the integrated information theorists, it’s a little bit like going to a climate denialist conference.

Joscha: The fascinating thing is you look at the conference program of the people that don’t believe in global warming, is that they don’t agree with each other.

Joscha: There are some which will argue that there is no global warming. Others will say that global warming is not manmade. Some will just point to errors in the way that the mainstream of the scientific community talks about global warming and so on.

Joscha: And they all accept each other’s papers. Why is that? Even though they are so fundamentally not on the same page. There is much more disagreement between them than there is disagreement, that they basically have with the mainstream.

Joscha: And it’s because they are defined by their opposition to the mainstream. And the same thing happens with these workshops about the IIT theory, that is the integrated information theory.

Joscha: They disagree about how to compute phi for instance, this measure for integrated information. On what the status of phi is. On how relevant it is. On how to interpret the concept of integrated information in the physical universe, and so on.

Joscha: They are mostly opposed to functionalism. And they try to find an alternative to functionalism. And I suspect that’s the main driver for the popularity within the IIT community.

Joscha: Within the physics community it’s probably the reason that Max Tegmark has decided to make this one of his planks in his boat, that tries to integrate over everything in the universe from the perspective of a physicist.

Joscha: And so in some sense the adoption in physics, that this is taken seriously, is probably a little bit political. I have great personal respect for Tononi.

Joscha: He basically is a neuroscientist that earns his money, and respect as a sleep researcher. And he is also a philosopher, and an autonomous intellect.

Joscha: So it’s somebody who is not part of a particular school in philosophy. And as such would be a hard time being accepted in philosophy. He has a new solution to an age old philosophical question.

Joscha: And if that is the case it’s usually bad news. Because all the good answers that you can find without lapse have been around for a very long time. You just need to find and translate them into the modern language, or into your own concepts.

Joscha: And so if he has found a new solution, the question is, what is the solution for? What does it really express? And the suspicion that I have, is that the core of his theory is not really been published. That was also the impression that I got from personal interaction.

Joscha: The core of his theory is not phi. It’s not this measure. This measure was introduced as a need to produce a theory with quantifiable statements that can be mapped to experimental predictions. Because that’s somehow seen as the gold standard in the sciences, but not in philosophy.

Joscha: For him it’s more an attempt to find a theory that explains, or that talks about panpsychism, and reintroduces it. And it’s mostly because he doesn’t see how functionalism can solve the problem of consciousness.

Joscha: And he does latch onto this idea of distributed information in the brain. But the distributed information that happens in the brain is probably not the same thing as it happens for instance in quantum mechanics.

Joscha: And I don’t think it’s non-computable. And it’s a little bit ironic, if you try to make information theoretic theory, and phi is an information theoretic measure, for consciousness that is not functionalist.

Joscha: Because functionalism and information theory are strong intertwined. It’s an oxymoron to make an anti-functionalist theory using information theory I think.

Jim: Yeah. It’d be helpful for our audience if you could describe what functionalism means. I could try it, but I think you could do a hell of a lot better job than I.

Joscha: So very simply put, and very cursory, is that functionalism treats a phenomenon as a result of its implementation. So for instance what is a bank? A bank is a thing where you can have an account, and you can have an interface to it, you can store money in it. You can extract the money from it. It is conformant to a certain legal interface. It’s giving you certain guarantees, that are compatible with the legal and monetary system, and so on.

Joscha: And if an institution is fulfilling all these principles, then it doesn’t make sense to not call it a bank. Dennett uses this as a metaphor to explain his opposition to the concept of philosophical zombie. A system that is identical in all its features, except not having phenomenal experience, to us.

Joscha: And a zombank would be a bank that is basically a thing where you can store money, and retrieve it again, and you can ask for your account level and so on, but it’s not a proper bank, because it lacks the essence of a bank. The true bankness. The thing that you cannot really touch, and that has no causal influence of its interaction with the outside world.

Joscha: And this we would reject this notion of a zombank as nonsensical. It doesn’t make sense to distinguish between a bank, and a zombank, because there is no essence of bank beyond its functional interface.

Joscha: And why would this be different for consciousness? Is there some kind of a hidden essence? And so in some sense functionalism is the rejection of the notion of a hidden essence, of something that does not have causal properties that we can observe.

Joscha: Everything can be explained in terms of its causal properties, and all causal properties ultimately can be explained as functions that are implementable. Which in my perspective means computable, realizable in a physical system.

Jim: Yeah. I must say when I read Dennett’s Consciousness Explained, while I didn’t necessarily buy his pandemonium theory, I did find his rejection of philosophical zombies reasonably convincing. What’s your thought on that?

Joscha: The pandemonium theory by the way goes back to Selfridge, and has strongly influence Minsky who was Selfridge’s student, and has led to the society of mind.

Joscha: And I think it’s a very beautiful theory. It says that the mind is basically a bunch of agents that implement different behaviors. And you can entrain your mind with new demons.

Joscha: And demon is a good metaphor. In computer science it’s a program without a user interface, in a way. Without a use facing front end. But it’s an automatic behavior that basically can be instantiated, can run multiple of them, they can interact with each other, and produce combined behavior in your computer.

Joscha: And in this pandemonium theory you have a stage where you have the active behaviors that are currently populating your working memory, and perform certain tasks.

Joscha: And there are others that are sitting in the audience, and can basically evaluate what’s happening on stage. And they can pull other on stage, or boo some off stage.

Joscha: And those that are on stage can also call up their allies, allied behaviors, and get them on stage to enact a certain scene.

Joscha: And so it’s, I think a very powerful metaphor to imagine how the self organization of behaviors in the brain might work. It’s not doing much more than this. It’s not yet an implementation.

Joscha: But it’s I think a good start to basically see a self organizing system that is adapting itself to a variety of unknown situations in the world, by composing a team of behaviors at any given time.

Joscha: So this is the power of the pandemonium theory. For Dennett’s theory of consciousness in self, I was always wondering why Dennett seemed to be so ineffectual among philosophers, and among philosophy students.

Joscha: A lot of philosophy students don’t like Dennett very much. And also a lot of philosophers don’t like Dennett very much, or analytical philosophy for that matter.

Joscha: And I suspect it’s because Dennett seems to miss the problem that people try to explain. He is not talking about that problem very much.

Joscha: It’s not because he cannot explain it, but because he doesn’t seem to think that phenomenal experience is that important at all. It doesn’t take center stage in his philosophy.

Joscha: And I think that could be because Dennett is such a nerd. Maybe Dennett doesn’t have that much phenomenal experience. Maybe Dennett is somebody who is extremely constituted on the conceptual side.

Joscha: And I can relate to that, because I am also very constituted on the conceptual side. It’s only that I started observing this difference in people who are constituted on the side of feelings, that are constituted on the side of what the people of science mean are intuitions, maybe pre-scientific intuitions, that scientists are very wary of.

Joscha: I think that a scientist, or a true philosopher is born as an aberration. It happens when a child is so desperate, that it decides to permanently trust its ideas more than its feelings.

Joscha: And this is often the case when you are a nerd. You have the best of intentions. You have very pure feelings about how you should interact with the world. You have compassion for your environment maybe. But you might have poor empathy, because some of the wires in your brain are not wired properly during your development.

Joscha: And so you cannot guess the mental states of others. You don’t have a sense for the mental states of others. It’s difficult for you to figure out what you should be believing, that you should modify your beliefs to fit in and so on. And it’s maybe repulsive to you to modify your beliefs just to fit in, or to pretend that you modified them, because you don’t lack the innate sense how important that is to play ball with most humans.

Joscha: And so as a child you fail in your social interaction with non-nerds. And because nerds are few, and far between, it’s probably less than 6% of the population are on that spectrum, you learn that you cannot trust your feelings when you want to interact with other people. You fail if you do that, right?

Joscha: So what do you do? You need to build rational models for interacting. And I suspect that’s one of the reasons why so many people that are very, very good at rational analytical models are nerdy, and end up in the sciences, and are not very good at intuition, and at intuitive empathy, and at the resulting social interactions, right?

Joscha: It’s almost a trope that good scientists are poor in social interaction. And it’s a disaster I think that we are cherishing scientists as a lifestyle archetype, and as the goal of the educational system that every child should somehow become a scientist.

Joscha: No. A proper scientist is very useful to society, but it’s a defective human being. If they are trusting analytical tools, which is very brittle, and hard to prove, more than they trust their feelings and intuitions, right?

Joscha: Science works like this. Science works by always finding analytical truths, and only believing in what you can prove. But this is not viable for life, because almost everything in the world is more complicated than you can deal with human logic, right?

Joscha: So science is able to deal with the edge cases. But it’s not as good in dealing with the mainstream of problems that you have in life. Scientific theories don’t help with most of the everyday problems. They only help with the difficult edge cases, where your intuitions fail.

Joscha: So in some sense science exists to deal with our darkest emotions in a very literal way, or our darkest feelings. It’s those feelings that are extremely difficult to disambiguate. That’s why they’re dark.

Joscha: They don’t help us there our feelings, in these darker regions. They don’t tell us what to do. And this is where we need logical reasoning.

Joscha: And so I do think that scientists are important. They need to be protected, but they are the result of a particular mindset. And this is also true for people like Dennett.

Joscha: Dennett is not able to talk to people very successfully in the same way as Deepak Chopra does. Because Deepak Chopra might not be analytical, completely clean and pure, but he’s able to resonate with people at a very deep level where he can tell them what they really care about, what they want to have explained in their everyday life.

Joscha: So in some sense if we want to make computationalist philosophy attractive to the mainstream, we have to explain to them how functionalism is dealing with intuition, with phenomenal experience, with the sense for the greater whole, with our sense that we are not confined to a single self in a single organism, but that we stretch far beyond that. That we are somehow distributed in the universe.

Joscha: Experientially that’s true for normal people. It’s probably not true for most scientists. Most scientists are confined to their own intellect at some level, including myself.

Jim: Yes. And I mean I know a lot of scientists, and certainly in the scientific community the kind of folks you’re describing are over represented, in fact probably the greater ones are even more overrepresented. But it’s by no means 100%, probably not even 50%.

Jim: Probably the greatest scientist I personally have met and spent time with, was Murray Gell-Mann, and he’s an amazingly social character, right? He has social intuitions. He knows a lot of things that aren’t about science, and he can weave them into an extraordinarily interesting conversation.

Jim: Unfortunately he is now passed. He was a great guy out at our Santa Fe institute. But he was certainly an amazing scientist who nonetheless was a quite complete human being. So I would make sure we don’t over…

Joscha: Oh absolutely.

Jim: While I agree statistically it’s true.

Joscha: Well these geniuses that are very good at all the fields, they do seem to exist. But they are very rare, and far between. It’s basically people that are able to integrate so well over so many areas, that they reach that stage of development in all the areas, right?

Joscha: I also would not say that scientists are in a moral, or experiential sense incomplete human beings. They are just different from your run of the mill homo sapiens, or many of them are.

Joscha: Even somebody like Noam Chomsky, who I very deeply respect, or Marvin Minsky, they are people that have very specific minds, right? They are very good at some things, and the other things that they don’t specialize in, basically they have indications of Asperger’s, of a mind that is hyper focused on some areas at the expense of others.

Joscha: And there is the suspicion that this hyper focus is the result of a compensation. That you get super good at writing books, and Chomsky is probably one of the greatest minds of his generation, and also in terms of writing.

Joscha: If you ask him something he’s going to respond with a complete chapter, including footnotes, and references. Most other people that I know only think in paragraphs.

Jim: And actually it’s rare only good thinkers can think in whole paragraphs. I know some that do. So let’s get back though to the philosophical zombies, and Dennett. That’s an interesting perspective that Dennett doesn’t see the problem.

Jim: And maybe the real payoff as you go one step beyond philosophical zombies, and go to David Chalmers’ hard problem, which Dennett also rejects as sort of being a non-problem.

Jim: And I think that’s relatively close to his rejection of philosophical zombies. What’s your position on Chalmers’ claim of the hard problem?

Joscha: I suspect that Chalmers in part of the time sees ways out of the hard problem. But he’s still mostly on the side where he doesn’t. His current philosophy is often focusing on the question that we don’t need to explain the hard problem itself. We need to explain why people think that there is a hard problem. Right?

Joscha: So we need to explain the psychological certainty. And we can come up with theories that explain their psychological certainty. So we don’t have to explain the phenomenon how a physical system can have conscious experience. We need to explain why we think that we have conscious experience.

Jim: That’s interesting.

Joscha: Right? And I think that this goes in the right direction. There is something that struck me when I was looking into parapsychology. And I in some sense grew up with a part of parapsychology. For instance in the ’70s, and ’60s when LSD was around and still not illegal, a lot of people at the CIA experimented with LSD.

Joscha: And incidentally during the same time, there were a lot of experiments that were similar to, if you’ve seen the show Stranger Things. The MKUltra program, and related programs were looking also into clairvoyance, and into far-sensing, where people would focus their mind on different parts of the world, and would use out of body experiences to sense what’s going on there.

Joscha: And a number of papers and books came out of this, by people that were not completely loopy, and that had proper state funding. And when I read this stuff, I concluded this is probably incompatible with physics as we know it. It’s certainly not compatible with the standard model, because we need more than the four forces that are compatible with the standard model to explain what’s going on here.

Joscha: And the quantum non-locality explanations that come up here, are not explaining it. Because quantum non-locality does not allow the transmission of information that normally would require a photon to send. Right?

Joscha: It might be possible that there are different regions of the universe that share state, but you don’t know which regions of the universe these are. So you would need to have classical back channel to that region, to know about the entanglement between these different regions.

Joscha: And so it’s not obvious that there is some kind of quantum mechanical explanation for op-psi phenomena. And as a result, either significant parts of physics that work extremely well in the lab are wrong, or these phenomena are not correct.

Jim: Our mutual friend Ben Goertzel is a fan of these psi type things.

Joscha: Yes. He is also incidentally a big fan of psychedelics.

Jim: Yeah. I’ll confess to have done a little psychedelics in my day, though not in 40 years. And yeah, they are an interesting altered state.

Joscha: So there is a correlation between psychedelics and the ability to look into the future, without winning the lottery, right? There is a weird thing about psi phenomena. They don’t seem to change the physics of the universe, or the statistics of the universe very much.

Joscha: If we could evolve the ability to reliably far sense, or look into the future, we should see a lot of animals that do this, and predator prey dynamics. We should have a small subset of the population at least, that quite reliably wins the lottery without cheating.

Joscha: Yet the banks still win. How is that possible? It’s an argument that for instance has been made by Stanislaw Lem in I think Summa Technologiae, in one of his purely philosophical books, that don’t make a concession to the reader. That’s a very beautiful one. And there he argues against psi in this regard.

Joscha: Also had a very good friend at my time in Harvard, she still is a good friend I hope. She had prophetic dreams every night. Lucid dreams. It’s very difficult to deal with this, because she was often able to look into the future during these dreams.

Joscha: And she started writing this down, and she noticed that when she was experiencing things that were causally relevant, that basically would be the equivalent of winning the lottery, that would change the future in any way, she was unable to write them down.

Joscha: And it’s like the men in black basically prevented her from doing that. And I think there’s only one good explanation for that. The explanation is that your memories are changing retroactively.

Joscha: So you don’t remember when your memory changed. You don’t remember when your construction of reality changed. She was not able to write it down, because she didn’t know it at the time that these lottery numbers would come.

Joscha: These lottery numbers come, and together with the knowledge about these lottery numbers, you instantiate the memory of having foreseen what these lottery numbers were. Right? This is the easiest explanation.

Jim: Yeah. And in which case you actually did not know the lottery numbers, right? You had the memory of knowing the lottery numbers, not actually knowing them.

Joscha: Yes. But there is a deeper phenomenon here. So if the spiritists, and op-psi theorists, and parapsychologists are right, and I’m not completely ready to dismiss them, and Turing was also not doing so, right? If his famous 1950 papers makes explicit, and affirmative references to the high probability that telepathy is real.

Joscha: So he even asked the question, “Could an AI system be truly intelligent if it’s not telepathic? Or would this be a sufficient intelligence if it’s not telepathic.” There is a deeper implication that he doesn’t discuss.

Joscha: If telepathy is real, if there is a possibility to use unknown physics to send information across minds, that are not in any kind of known signaling relationship to each other, so they are adjacent. So you can observe the other one, and entangle yourself with their own vibrations in their mind, just using your visual sense, or other known senses, right?

Joscha: If that is true, if there is something like telepathy using an unknown physical causal mechanism, how can you guarantee that your consciousness and your mind is computed in your own brain? How can you guarantee that your brain is not merely an antenna, and that your consciousness is computed elsewhere, maybe outside of the universe. And you [inaudible 00:56:28] in this universe, right? In this conscious.

Jim: Yeah. That’s the Penrose hypothesis, right? That the microtubules are somehow quantum antennas that alow us to get signals from across the multiverse perhaps.

Joscha: I don’t really understand why Penrose is affiliating himself with that so much, right? It’s difficult, because his theory is different from Hameroff’s theory. I’ve never seen Penrose activity endorsing the microtubule in the same way as Hameroff does.

Joscha: In my view, and I really, really like Hameroff as a person, Hameroff is building a psychedelics sculpture garden with his theory. It’s a theory that is making more predictions every year, or more explanations every year, using the same mechanisms.

Joscha: And so it’s a theory that explains how anesthesia works, how psychedelics work, how conscious works, and now also how evolution, and emotion work. All using the pi resonant quantum underground.

Joscha: In a way that I was at the Science of Consciousness conference, sitting in the audience, and he was putting stuff on the screen. And I felt that my basic understanding of physics was sufficient to understand where the physics ended.

Joscha: And I asked the people next to me, “Are you not concerned that you don’t understand what he’s talking about in this physics, about the pi resonant quantum underground?” And they said, “Don’t worry.” He shut down as soon as he mentions that.

Jim: Oh dear.

Joscha: It’s not the important part of the theory.

Jim: Yeah, truthfully I’ve not checked his stuff out. I will have to.

Joscha: It’s super interesting to read it. And it’s really beautiful. I like it. It’s also very poetic in a way, but I think it’s more art than science. Because it’s integrating observations, it’s projecting things, it’s connecting loose ends. But it’s not doing this in the same way as a scientist would properly do it, which explains its lack of resonance in the sciences, right?

Joscha: There is almost nobody outside of his circle, who is citing these papers, and working with this, and thinks that they are in any way real, and worth discussing.

Joscha: The reason I think why Penrose is affiliated with is it’s, you know if you are excluding all the viable explanations except one, or all the probable explanations, then the improbable explanation has to be the reason.

Joscha: And Penrose thinks that computation cannot explain all of consciousness. And I think it’s because of this way he interprets Gödel’s proof.

Joscha: Gödel has in some sense proven that computation is insufficient to do all of mathematics. There are parts of mathematics that cannot be done in the computational paradigm. Assuming so leads into contradictions.

Joscha: And Penrose believes that human mathematicians can do these parts that computation cannot do. I think the resolution works the other way around.

Joscha: These other parts of mathematics were never real. Mathematics was ill defined. The truth theory of classical semantics is wrong. You can only claim that something is true, if you can actually compute that truth.

Joscha: You can only claim that something has a value if you can actually compute that value. It’s basically constructive mathematics. And constructive mathematics is roughly the same thing as computation.

Joscha: And what Gödel has shown in my view is that constructive mathematics is real, or can be real in the sense that it can be implemented, but classical mathematics that contains infinities cannot be implemented, cannot be real.

Joscha: Nothing in the physical universe relies on having known the last digit of pi. Right? There are mathematics that pretend that there are, that this works. And some of these mathematics are even used in physics, but they’re not real, they’re not computable. They cannot be implemented in any physical causal structure.

Joscha: So this was the implication. And this is an implication I think that Penrose has not seen. He believes that the ability of our minds to be conscious are related to the uncomputable part of mathematics, and the uncomputable parts of mathematics go beyond known physics, which in some sense is all computational. Right?

Joscha: Even quantum mechanics takes a bunch of numbers, and performs expensive computations on them, and then you get the next bunch of numbers. And this is how we explain the universe.

Joscha: And the only part that is not explained in this paradigm so far is quantum gravity. So the culprit must be quantum gravity, right? It’s the only thing that’s left from Penrose’s perspective.

Joscha: And the only one who tries to offer a theory that uses quantum gravity to explain consciousness in the brain is Penrose, using his microtubulae. And I think this is how it comes together.

Jim: Interesting. Yeah I’ve truthfully read it, when he came out. Read a couple of things on it, and I just put it aside and says, “Well I’ll know that it’s out there, but I’m not going to pay it any mind until somebody comes up with some useful experimental results.” And I’ll have to check out this Hameroff character. I don’t know about him.

Joscha: But let’s go slightly back to this big distinction between physics and the parapsychology, which leads us into a different world. I think that there are only two possibilities. Either we live in a mechanical universe. And this is the hypothesis of physics.

Joscha: It says that everything is at some level there is a causally closed mechanical layer. And this doesn’t mean that the causally closed mechanical layer needs to look like what the universe looks like to us, right?

Joscha: The Einstein space, and our theories of quantum mechanics are probably high level descriptions, that don’t describe the causally closed lowest level, but it’s still mechanical. It’s still built in such a way that it can be expressed as a computer program.

Joscha: And the alternative to that is, that we live in a dream. What’s the difference between a dream, and a mechanical model? A dream has magical interactions. They are symbolic interactions. Which means you sacrifice the cat, and the comet appears. Or the comet appears, and it predicts your career.

Joscha: There is no known physical force, or plausible physical relationship that we could discover that would explain this kind of interaction. It just means that somebody is messing with us. There is a conspiracy, right?

Joscha: The same conspiracy that exists in Minecraft if you open up a console and said, “Time set day,” and the sun suddenly rises in Minecraft. Where you basically supersede the basic low level causal mechanics of Minecraft, using a higher level of causation, that is outside of the basic game dynamics.

Joscha: And this would be a world in which psi is possible, in which psi in the esoteric parapsychological sense, where you can use telekinesis to overcome physics, or where you can use clairvoyance to overcome the limits of information transmission via subliminal photonic transfer of data between different regions of the universe. Right?

Joscha: So how is that possible? And I think the reason that this is so attractive, that the theory that we essentially must live in a dream is, that we actually live in a dream. You do live in a dream.

Joscha: It’s possible to have these experiences. It’s possible to experience telepathy, it’s possible to experience clairvoyance and so on. And it’s because we live in a dream that is generated by a mind on a different level of existence.

Joscha: And this different level of existence happens to be physics. And our culture is a little bit confused, because it assumes that the world that you and me see in everyday life is physics, is the physical world.

Joscha: But there are no colors in physics, and no sounds in physics. What we perceive is still the dream. And our brain that is out there in physics, does not really look pink and squishy in physics.

Joscha: It’s only a thing that, this is what it looks like in the dream, the dream that the brain is generating. And to get this right, that is the important thing.

Jim: Yeah. Well I think we all know, we’ve known for a long time, that we don’t experience actual reality. We have an interpretation of reality, and then we have meaning maps, right? We-

Joscha: Yes. So we don’t experience a simulation of reality, it’s a simulacrum of reality. It’s completely as if. It’s not isomorphic to the world out there in physics.

Jim: And it’s at different levels of abstraction right? I mean the human perceptual capability can’t tell us about atoms for instance, without instruments.

Jim: And the idea of colors even, as we know from anthropology, the different cultures divide the spectrum up in different ways. So some tribe in the Amazon might not describe the brain as pink and squishy. It might be blah, blah and squishy, which is a very different, narrower part of the spectrum than pink for instance, or probably much broader. I think actually we have a more fine grained color than many cultures.

Jim: So yeah, they’re meaning maps. And I think that’s where people get confused. I still remain a naïve realist at heart. I do believe there is an objective universe out there. Can’t prove it. As we well know, I can’t logically prove the universe wasn’t created fives seconds ago with all the objects in it, including our memories, and all ballistic objects in motion.

Jim: But if only for parsimonious reasons I assume there is a real universe out there. And as you say, we have some physical laws, but they’re a very long was from the Planck scale, so there’s probably a shit load we don’t know, right?

Jim: And yet so far, they have shown themselves to be remarkably lawful. We haven’t been able to detect, at least with a high fidelity signal any psi, any real even deviation from deep lawfulness.

Jim: And so, at least tentatively I side with the physicists. Though, I do encourage Ben, which he does regularly, send me new papers on psi. Because as you say, if we’re wrong, or if I’m wrong, and psi is real, then we have to reevaluate, is the universe actual lawful? And maybe it’s not just a dream. Maybe it is a simulation. But for the time being I reject that on grounds of parsimony, if no other reasons.

Joscha: Yeah. So the only reason to not do it, to not accept this theory is because you don’t think that scientists are able to explain the things that need to be explained.

Joscha: Which is, why does reality appear real to us? From a machine learning perspective, it’s pretty clear that if a learning system does not in some sense implement the belief that the universe is learnable, then it’s not an effective learning system, right? You have to believe in a learnable universe to learn it. At least implicitly.

Joscha: And so the weird thing that we have to explain is I think not the qualities of qualia. The qualities of qualia are easier than most people think. Because this is just the geometric calculations that your perceptual systems are making. It’s basically the dimensions, the parametrizations of the geometric architecture that is computing the perceptual models.

Joscha: And in some sense this has been neglected for a long time, because scientists have focused on the linguistic, the analytical models too much.

Joscha: And only with the widespread take off of the machine learning paradigms, and deep learning paradigms, I think it has gotten more into the common consciousness of cognitive science, that perception is more akin to the deep learning systems than it is to linguistic systems, to symbolic systems, right?

Joscha: The paradox is you can of course implement the deep learning systems on top of symbolic systems. We actually have to. They’re completely implemented on top of symbolic systems. But they are symbolic systems that are very different from our symbolic reasoning.

Joscha: Our symbolic reasoning is arguably limited to a very small stack size, and to very few elements at a time. We cannot hold more than like five to seven elements in our focus of attention at a time, and relate them to each other.

Joscha: Unlike GPT-3 that is looking at 2048 and relates them all to each other at the same time. And it’s doing this in many, many dimensions, and with extremely high resolution, and reliability, and no lapse of attention.

Joscha: So this is a very different way of doing symbolic operations than our mind is doing. It’s doing this on the level of low level automata.

Joscha: But we have to explain to people how the property of realness comes about. And the property of realness itself is paradoxically not a feature of physical reality.

Joscha: Physical reality doesn’t feel like anything. There are no feelings in there. Physical realness can only be experienced as part of a model, because it’s itself a model property, right?

Joscha: It’s a label that the mind attaches to some of these parameter dimensions. And if you look at them, you distinguish the non-real imagination from the real world that you experience by this label. Because your mind says, “This is indeed predictive of your next batch of sensory patterns, as far as I can make it.”

Joscha: And this includes the internal sensations that you have about your own self, and your own thinking processes, and reasoning processes, and your experiential processes.

Joscha: Your experiences are real experiences, because they are predictive of your next experiential features. Right? They are models of what you experience. And the realness itself is a model of the fact that they are predictive.

Jim: Yeah. Though of course it’s true that for humans, again as I said before, the resolution of our measurements and perceptions are pretty gross. And we’re able to go way beyond that with our scientific instruments.

Jim: Yes we evolved as if we are in a lawful universe, because that’s what works. Right? But it may well have been that we were so course graining the universe, due to our low resolution vision, the acuity of our feeling, the ability to decompose matter, that we could’ve easily been fooled by something going on deeper down, right?

Jim: At least unless our instruments are lying to us, we can probe, I don’t know, 20 orders of magnitude smaller than we can get at as unaided humans, and 20 orders of magnitude larger.

Jim: And yet lawfulness still seems to prevail in both the microcosm, no the macrocosm, even though our formulations of the laws are almost certainly very substantially incomplete.

Jim: That seems to me much stronger evidence for reality, deep reality than merely our sense of reality. Though I do take your point that reality is not an attribute of the universe itself. But it’s rather an experience of a conscious agent living in the universe.

Jim: But I would say we should have more confidence in reality than merely our naïve animal consciousness, because of the fact that we’ve been able to extend our probes a long way, and both for the microcosm, and the macrocosm.

Joscha: I think it’s important to hold it somewhat tentatively. I think that to have an enlightened relationship to reality, it is necessary to realize that what you perceive is a representation. This includes yourself, and your relationship to the universe.

Joscha: That this is all in some sense a representation that you don’t perceive as a representation, but as an immediate reality. And you need to make it visible as a representation. You need to pay attention to that.

Joscha: You also need to pay attention to attention, in such a way that the attention itself becomes visible to you. That you can notice how your attentional system works, and how it’s constructing your reality, if you’re interested in that.

Joscha: And for a normal human being, the only reason to be interested in that, is because it doesn’t work. And since these attentional processes tend to work very well, most people don’t pay attention to them, and just take them as given, the processes of reality construction.

Joscha: And the people that are familiar with these processes are those that are naturally in altered states of minds, because they fell down the stairs headfirst at some point, or have [inaudible 01:12:10] issues, or because they are in an existential crisis for instance.

Joscha: And this existential crisis makes it necessary for them to understand their own relationship to meaning, and their own self construction.

Jim: Yeah. Let’s step back a little bit. We’ve been talking about mind emerging from matter. We’ve talked about a lot of the leading theories. We’ve talked about some of the out there theories.

Jim: From your perspective, what’s important for the next step in the science of explaining mind emerging from matter? Where should we be looking next? What’s the next 10 years look like?

Joscha: So it’s of course always very difficult to make predictions, especially about the future.

Jim: Yes, thank you Yogi.

Joscha: Yes. I suspect that one of the very interesting areas that we need to look at is, attention based models. And their transformer is only the beginning. From my perspective there are at least three obvious things that are wrong with GPT-3.

Joscha: Interestingly the fact that GPT-3 is not an agent, is not one of them. That can be easily fixed, right? GPT-3 is obviously not an agent. It doesn’t have a context.

Joscha: So if you tell GPT-3, “Hey, you are GPT-3. What are you?” Then GPT-3 will produce a relatively random sentence, because GPT-3 has been trained on statistical, statistics and language until October 2019, and GPT-3 didn’t exist in October 2019. Right? So unless OpenAI has built something into GPT-3 explicitly to teach it about GPT-3 what it should say, it doesn’t know what it is.

Joscha: And if you tell GPT-3, “I am talking to the AI GPT-3,” and so on. Then it will assume, in some sense implicitly for some meaning of assume, that it’s talking about some kind of science fiction context, or a technical context in which a human is communicating with an AI system.

Joscha: And then it might guess a number of things right. But it doesn’t know which ones are right or not. It also doesn’t know whether it relates to any kind of reality. Right? There is no sense of an underlying reality. There is no fixed context.

Joscha: But you can add this fixed context, by making additional commitments. And you need to make these additional contexts, commitments, I think that is an implement of loop theory, which is a recasting of Gödel’s proof.

Joscha: A system is not able to basically break out of its own axiomatic system, and make statements about axiomatic systems above it. In order for a system to reason about itself, it needs to recreate itself within its own formalisms. And then make a statements about this recreation of itself.

Joscha: So for formal systems that you have created, of course you can build them in such a way that you can recreate the formal system within the formal system. So the system can make proofs about itself.

Joscha: But strictly speaking you cannot make proofs above yourself. You can solve this problem of agency in GPT-3 I think by building, in principle, a vision to speech module, or vision to text module, that is interpreting the camera images of a robot. Sends them to GPT-3.

Joscha: And then GPT-3 tells a story about a robot in that world. And then a parser is reading these statements and translates them into motor actions of the robot. And we continuously play that game.

Joscha: And now arguably language is not the right level of resolution. We want to have something that is sub-conceptual. We want to deal with perceptual stuff.

Joscha: But GPT-3 is able to deal with that, right? There is Image GPT, which is able to learn the statistics of images, in the same way as a GAN would, or even arguably in a better way than a GAN would, within the limits of its small attentional window.

Joscha: So the main issue I think is the creation of a larger attentional window. At the moment GPT-3 has massive retrograde amnesia. It’s not able to remember anything from two pages ago, except the extractions that it made from that.

Joscha: It’s able to change its neural network as a result, even though it doesn’t do online learning. But during training, of course it takes something from what it has seen before. But it’s not able to reestablish the context in which it took these insights, right?

Joscha: When I read a book, I might in a later chapter recall an earlier chapter, and create a context, this is merging the current chapter with the previous chapter.

Joscha: And then rewrite everything that I learned from the previous chapter in the new light. Because I can reconstruct what I learned from the previous chapter, and where I got this knowledge from, and so on. Right?

Joscha: So for instance I have an idea that I misunderstood, or that was too simplistic, and I need to revise the idea. And I ask myself, “Where did I get this idea from? Why, and how I am revising this?” Right?

Joscha: And now I create a new working memory context in which I direct my attention. And this working memory construction is a thing that GPT-3 cannot do yet.

Joscha: So we need to extend attention in such a way that it’s able to change working memory contexts actively, and construct working memory contexts.

Joscha: And we need to change the level or representation, from language to a multimodal representation that is agnostic to what it represents, and addresses.

Jim: You mentioned something in passing, which I want to call out, which is rewriting. You know one of the things that we know humans do is, our memories are not only low fidelity, but they’re also quite subject to be rewritten, right?

Jim: And certain kinds of linguistic processing may be implementable as rewrite rules. And I think we both have had some exposure to the OpenCog system. And the OpenCog system, a lot of it’s based on the concept of both local and global rewriting.

Jim: And that seems very far from GPT-3. I mean it is what it is. It doesn’t have any essentially dynamic rewriting capability going on in it.

Joscha: Exactly. So this is the second thing that is wrong with GPT-3 in my view. The first thing was the working memory window, which is limited to 2048 adjacent tokens.

Joscha: And basically there are hard constraints in there, in which the working memory is used, that don’t exist in our own mind. It’s not necessarily that our working memory is larger. I suspect it’s much smaller than the ones of GPT-3, but we are able to construct the contents of our working memories, with many more degrees of freedom.

Joscha: So this is the first thing. The second one is online learning. GPT-3 is only doing offline learning. Which is good for an industrial production system, that is meant to behave in pretty much the same way.

Joscha: But if you want to build a system that is working like us, it needs to continuously learn. GPT-3 has stopped learning in October 2019. So it doesn’t know about anything about COVID-19, or George Floyd. It lives in a different universe.

Joscha: And we need to have an agent that is constantly learning, and tracking reality in realtime around it. But this is something that also can be overcome right?

Joscha: It requires massive changes to the algorithms that are being used. You cannot use the same neural learning algorithms that are currently implemented in GPT-3. But it’s nothing that is completely out of this world. This can be done.

Joscha: And the last thing is relevance. GPT-3 does not care about relevance. The relevance sensation that GPT-3 has, is because humans don’t write everything into texts. They write things into texts that are relevant to them.

Joscha: So by minimizing the surprisal on all the available texts that have a decently good scoring on Reddit, means you are probably learning something interesting. It was interesting enough for a human to write it down, right?

Joscha: So just by learning from this, this is way, way better than random stuff. But this is not sufficient for a system that is interacting with the world, and takes in rich sensory data on many modalities, at a higher bandwidth that you can process in realtime.

Joscha: So in this case you have to focus on those parts of the model that are most promising. And for this you need to have a motivational system.

Joscha: And I think that in practice you will have way better results from systems that are able to assign relevance to learning, and meta-learning, in ways that the current GPT-3 is not, right?

Joscha: So GPT-3 [inaudible 01:20:35] learning in ways that no human being can do, because it learns all this stuff which we find is irrelevant. When you read a textbook, you don’t care about the style so much. You care about the content.

Joscha: And GPT-3 does not think that the content is more important than the style. It only looks for style, and goes for so deep style that ultimately it often bottoms out in content.

Jim: Yeah. Say a little bit more about the affective part, and how that might be added into GPT-3.

Joscha: The affective part is, it’s basically something where the psi theory I think is still one of the best theories today. The psi theory posits that you can describe an agent like us, using homeostasis as a guiding principle.

Joscha: So there is a homeostasic balance that keeps the system stable in the face of an environment that disturbs it. Our mind is in some sense solving a control problem in many dimensions.

Joscha: And these control dimensions are given to us as needs, that when frustrated produce pain signals, and when satisfied produce pleasure signals.

Joscha: And this, when we act on these needs, we act on purposes, on models of our needs. And we have strong biases on what kind of models of our needs we form.

Joscha: And this hierarchy of purposes that we form, that becomes coherent is in some sense the structure of our soul. It’s not a random set of behaviors that is just sitting next to each other, and randomly arranged.

Joscha: It’s a system that strives for coherence. And the more coherent it is, the more our mind appears to be a singular solid thing, that has a definite structure.

Joscha: And in some sense you could say our true soul is the Platonic form, the ideal form of all these purposes arranged in the right hierarchy, including our transcendental purposes that go beyond the individual, and its present time, and space that it occupies.

Jim: Okay. I think there we’re about out of time. We didn’t get to any of the low level things I wanted to talk about. But that was all right. I think our conversation was good, and rich, and went a long way. And I hope the audience appreciate it. I know I certainly did.

Jim: So Joscha, I’d love to have you on the show again some time, to talk about your thinking about MicroPsi 3, and things in your own workspace. But I really want to thank you for this amazingly broad ranged, and yet deep discussion.

Joscha: I thank you too. I really enjoyed talking to you again. And let’s do it again some time.

Production services and audio editing by Jared Janes Consulting, Music by Tom Muller at