Transcript of EP 278 – Peter Wang on AI, Copyright, and the Future of Intelligence

The following is a rough transcript which has not been revised by The Jim Rutt Show or Peter Wang. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is Peter Wang. Peter’s the Chief AI and innovation Officer and co-founder of Anaconda, the Python and Python data sciences company. I use Anaconda regularly. Peter leads Anaconda’s AI incubator, which focuses on advancing core Python technologies and developing new frontiers in open source AI and machine learning, especially in the areas of edge computing, data privacy, and decentralized computing. Cool shit. Welcome, Peter.

Peter: Thank you. Thank you. Thanks for having me, Jim. Always a pleasure to chat with you about, just about anything, actually.

Jim: Peter and I chat very regularly about all kinds of crazy shit. It’s actually one of the highlights of my week when we get together for our little chat and try to figure out… In fact, on my calendar, it says, “Peter and Jim try to make sense of the world,” or something like that.

Peter: It’s a fairly open-ended topic.

Jim: Yeah, indeed. Back when I was a baby podcaster, didn’t know what the hell I was doing. On episode 16, Peter came on, we talked about the distributed internet. Peter’s been a huge supporter and catalyst actually for the distributed internet. Things like Bluesky he’s been involved with, and then back in current 092, we talked about the meaning crisis and consequentiality, whatever the fuck that is. But anyway, it was an extremely interesting conversation. So if you want to hear more from Peter, check out those episodes as usual links to those and everything we talk about will be on the episode page at jimrucho.com.

Peter: Yeah, those were some fun conversations. It feels like a lifetime ago. I mean, they’re still recent topics, right. They’re very important and relevant.

Jim: Absolutely. Yeah. Talk about seemingly a lifetime ago. I think the framing for our conversation today, we’re going to talk about deep learning, large language models, generative AI, that whole cluster of things. And man, ever since November 2022, it just seems like we’re in some kind of wormhole to some future that we can’t foresee. November 2022 being when ChatGPT was released to the world and instead of it being something only nerds knew about nerdy normal people also started learning about this stuff. And it seems like we’re again in yet another step up on this exponential in the last few months. It’s like every time I turn around there’s a new product or a new paper or a whole new way of looking at things and it’s fucking nuts. Is that your sense of things too?

Peter: Yeah, it’s like I called the moment the chattening, right, when ChatGPT was released, because even though I’ve been tracking things like… I’ve been tracking deep learning for a long time, but then when GANs came out, generative adversarial networks, and some of the early works showing some kind of multimodal sort of learning things that were pretty stunning to see what a deep neural net could do, and GPT coming out… Before ChatGPT, and I don’t know how many of your listeners may know this, but GPT’s first came, GPT-3, that was actually a big deal for a lot of nerds in the space, but ChatGPT is when it broke into I would say the public consciousness and then it’s all anyone could talk about. And it was interesting that they had built that chat interface with the human, sort of the instruct data set on top of GPT. That was really just a demo thing. They’d been building the GPT technology, it had been available as an API for a long time.

People were building cool things around this and companies around it, but when they made it accessible with the chat interface is when people really realized, “Holy crap.” And even me, I mean, I played with GPT a little bit, but then when it came to ChatGPT and realizing, I think to some extent the amount that was compressed into it and what it could do and what that chat interface could do, just a simple chat interface. So that, yes, since November ’22, it has been non-stop. We’re only two years in, and yet it does feel like a lifetime ago. But even in that there has been sort of periods of lots of activity and then periods of relative calm and then so on and so forth.

Now, I think we’re entering a phase where people are really looking at the post transformer and the post one shot predict the next token architecture, we’re really looking at workflows, the more complicated architectures at the inference time and looking at alternative structures besides transformers. So I think we’re going through a natural sort of evolutionary little transition, but I think the hits are going to keep coming. I mean, it’s kind of amazing to see what’s happening right now.

Jim: By the way, listeners may recall that I’ve occasionally said, and I think I always give attribution, that in December 2022, ’23, early ’23, ’24, I would often quote Peter and say, “You people, you think this is really impressive? This is Kitty Hawk 1903.” And man, we’ve come a hell of a long way in just a short period of time, and yet we’re probably still before World War I, right. We’re still bailing wire paper, balsa wood, stuff like that.

Peter: I would actually modify that ’cause I remember saying that this was the Kitty Hawk moment, in the sense, I mean, at a metaphorical level that tracks, people get that. But I actually think a more appropriate metaphor or analogy would be zeppelin’s, the great airships, the helium hydrogen airships that were produced before the Wright brothers moment. I think that the transformer model that we have, we’re going to look back in four or five years and we’re going to feel it is so incredibly primitive. It’s like just sewing together giant canvas bags filled with hydrogen and just floating people up. But hey, it is technically heavier than air flight, right. But we hadn’t gotten propellers. There was no concept of the aileron or rudder and yaw and pitch control, none of that. There’s just, we throw a bunch of tokens, we inflate these GPU clusters with tons of tokens and we get some kind of conceptual lift out of it. How does that happen? We have no idea. What does it imply for human cognition and human intelligence? Maybe some uncomfortable things, but it works. It works, right.

Jim: That’s the other analogy we’re talking about actually in the pregame, which I’m not sure this is a perfect… In fact, I know it’s not a perfect analogy being no such thing. But what I’m starting to play with is that maybe we are with respect to generative AI and deep learning and associated technologies, this flood of papers, it’s kind of what makes me thinking about this is that, maybe we’re approaching the invention of science in the 17th century where humanity had been building technique for a long time and gotten quite good at technique. The Romans were better civil engineers than Western Europe was until probably 1850, maybe, something like that.

But they didn’t have any theory, right. And starting the 17th century, the idea of unifying theory and ways of decomposing things to understand how they work and then the dynamics between the parts started coming in. That’s when humanity just started going crazy, starting around 1700 with the second industrial revolution around heat engines, the third industrial revolution around electricity in the late 19th century, fourth industrial revolution in 1974 with the integrated circuit, on and on and on. Some of these things we’ll talk about today are more theory than they are tinkering. So we may be moving from the artisanal stage of deep learning based AI to something closer to at least to a scientific understanding.

Peter: Well, I think it’s interesting that you talk about theory for humans because we might be crossing the threshold the other way. I’ll get to what I mean by that. But a thing I like to say is that humans are just narrow-band sensors that have a preference for structure. And we prefer structure because structure is the mechanism by which we can optimize our limited perceptual bandwidth. It’s compression. So even our very concept of a model by compressing, we can fixate attention and we can sort of smooth over the noise. So this self-directing filter is a little bit, you could say that’s consciousness, but certainly one of the things that the conscious mind does in our conception and our human experience of consciousness, we call it the binding problem. And when we talk about cognition that there seems to be a singularity. I mean, you’ve got how many billions and billions of neurons and many different layers of things flowing back and forth, but there is sort of an emergent sense of a singular focus, and that compression is through what we perceive in that compression is structure.

And the reason why we point this out is because theory is also rooted in ultimately math. When we say theory, we really talk about, we really mean is we have a mathematical structure which we can then move symbols around and make predictions that seem to correlate to reality. It’s kind of an amazing thing that if you really think about what happens if I write down numbers and I proceed quite algorithmically to add things or divide things that compute logarithms according to certain kinds of algebraic rules, ultimately it’s pencil on paper, graphite on paper with icons that then make a prediction that say, “Oh, I need to cut this length of wood to this distance. I need to go and turn this screw this amount. And if I do that, I end up with a thing that does what I predict.” So this modeling in the most abstract sense, mathematics is a structured model of certain dynamics in the world.

And so to your point where there was a phenomenological, one could it call maybe what the Romans did. Well, I guess, the Romans had Greek. I mean, they had math. They didn’t not have math. But the physics, the real deep understanding of a theoretical mathematical framework for physics, we could call that almost a phenomenology, the phenomenology model of the world. And what we did in the scientific revolution that then led to the Industrial Revolution is we developed a set of compressed model structures that could go into regimes that there was no way to do that around. When we simulate what happens in a rocket engine or the laminar flow of air around a Mach three spy plane, we can’t go and measure that. We have to do the math and just hope and pray the math works, right. So-

Jim: Yeah. GPS is another fine example. No fucking way you’d ever get the GPS by experimentation. If you don’t understand general relativity, GPS makes no sense.

Peter: Right. Some of the corrective factors there are rooted in the math. We were really guided by the math. And when you fly airplanes, you can be visual, you can fly under visual flight rules, VFR, or you can be instrument rated, right, IFR. And that’s when the weather’s bad, when it’s cloudy or whatever other kind of thing, you have to fly by the instruments and you can’t trust your gut and your proprioceptive system. So with human civilizational engineering kinds of things, the transition you’re talking about in the 1700s, that scientific revolution was a modeling revolution. It was moving from VFR to IFR, so to speak, and we were then able to make predictions and test them. So we had sort of epistemic sort of explicit epistemic program about how to test these models and refine them. And ultimately that led to modernity. We wouldn’t have anything in modernity ’cause all of it is manipulating structures and patterns of energy at different levels and different scale than any human could tweak by hand.

Jim: Yeah, absolutely. And just to take your analogy a little further. State-of-the-art fighter, planes can’t actually be flown by humans, right.

Peter: Right.

Jim: Nor really, I mean, I had one of the earliest consumer drones that didn’t have any autopilot functionality, and that fucked crashed all the time, but as soon as they started putting higher level autopilot in the Phantom 2, you’re no longer actually flying. You’re essentially steering through some space mediated by a low end AI essentially. So that’s like a third step from VFR IFR and what they now call fly-by-wire. For instance, the F-22 cannot be flown by a human being.

Peter: Well, the F-16 famously was designed to be aerodynamically unstable, and that’s from like, whatever, 40 years ago, right. it had to have a flight computer ’cause it’s aerodynamically unstable.

Jim: And now it’s just impossible. An F-22, even the best pilot in the world couldn’t even keep the thing in the air for five seconds probably. It’s just so unstable ’cause it gets some great benefits, respect to that.

Peter: I want to put a little thing in here that we can then, I want to set jump so we can long jump back to this thing, okay. Because right here is at the heart of some things. Number one, the concept of cybernetics, which was a term that I want to bring back because it’s the appropriate term. It’s this idea of a control system. We are so used to computing just being bits and bytes and click on a website and swipe on a screen. But actually the origins of modern computing was when people were trying to build control systems and what they call cybernetic systems that were able to do an explicit sensor, you know, sense the world, make some predictions, move some stuff, actuators, take some act, and then reevaluate, did that act then land in the world the way I thought it would, and then run this loop.

Jim: Basically a computational OODA loop.

Peter: It is a computational OODA loop, right. And if you look at some of the papers by [inaudible 00:13:10] whatnot, they’re all talking about anti-aircraft gun systems. “How do we contract this?” Or stabilizing the main gun of an Iowa-class battleship, and, “How do we put all these things in so that the rolling of the waves and the wind and all these things all get canceled out so that when this thing fires, it hits the targeting computers where all what computers were about.” So anyway, the word cybernetic is important. It’s really important because we’re headed into a world where we have to actually, as humans, come to terms with the idea that we are in a cybernetic reality. And there’s a lot of people who stand a profit by pretending, by selling to others a vision that we’re not. But actually the reality is we are living in a cybernetic reality and it’s running ahead of us, and it could run away from us.

And so even the term, I love the fact that you brought up that that’s called fly-by-wire. That term is a pleasant fiction to make the pilot feel good about him or herself that, “Oh, I yank a wire and the plane does what I want it to do.” Whereas actually a lot of the advanced fighter jets, they will detect when the pilot’s blacked out and the autopilot will then cruise for a while, while the pilot gets consciousness back, and that has saved many, many a pilot. So the idea that it’s a wire is a pleasant fiction to coddle the human narrative that we are in the driver’s seat. And so as we build more and more of these… It’s one thing to be in an airplane and then black out or to see an airplane can’t stay stable in the air. Okay, that’s a very visceral and accessible thing for most people.

But when these information systems run ahead of us, when they’re running circles around us at a cybernetic level, that then now I think starts really pushing into the space that so many of the people who are scared of AGI are trying to make, like, all these doomers about AGI destroying humanity, that sort of veers into those territories. And so I think the only real thing we can do right now that’s both of intellectual integrity, but that’s also I think the right moral thing to do, is to make as many people as aware and conscientious as possible, to elevate the discourse where we’re actually talking about the real things and not just yelling at each other, fighting some bullshit culture war.

So I think that for me is why I want to call out things like this and say, “This is the reality of it, that even our planes don’t fly themselves, even though we call…” I mean, we fly the planes in a loose sense, but the new systems we’re building, all of these things are things that exceed the human ability to process through stuff through our narrowband sensors. So anyway, just want to call that out.

Jim: Let’s do one more meta leap even further afield, and then we’ll pop back to what we were actually going to talk about. This is classic Jim and Peter. Man, we just jump around. The most interesting cognitive science paper I’ve read in a couple of years came out quite recently, December 17th, so it was just a couple of days ago. I just got the full text yesterday. It’s called The Unbearable Slowness of Being: Why do we live at 10 bits/s?

Peter: Ah, yes.

Jim: 10 bits per second. And essentially, this is what you were pointing to, which is this filter band in consciousness is even though we’re perceiving through our skin and our gut and our eyes and our ears about a billion bits per second, that’s actually more than we thought. We used to think it was a few hundred million, now we know it’s about a billion, and yet the fluctuations in our consciousness are at 10 bits.

10 bits. That’s astounding. And then the connection I made yesterday when I was reading this paper, this goes back to theory, the difference between a person who understands theory and a person who does not use theory, and by the way, lots of people don’t use much theory in their lives, is the so-called concept of they have hierarchical complexity, right. If I can only process 10 bits worth of let’s say, thinking, if my chunks of thinking are much more highly structured and carry much more information, then we can actually do a lot more effective thinking with our 10 bits. For instance, when I use the word complexity, it has unbelievable nuance history to the left, to the right, up, down. It’s this giant thing, but it’s actually only a few bits worth of actual information, probably in my brain because it’s pretty salient for bits, maybe something like that.

Well, if one of my redneck buddies said something’s complex, he means it’s complicated, basically, and not much more than that. And so when I’m processing a thought and saying, “I’m going to put on the complexity lens, just spent four bits, five bits, suddenly it produces this huge change in my ability to make sense of the world, even though I’m constrained by my 10 bits per second.

Peter: So I love that you brought this up because this has both nothing and everything to do with what we want to talk about today, right.

Jim: Exactly.

Peter: I would say, and I recognize this, it’s a bit rich for me to… The paper came from two folks at Caltech, a very prestigious institution. They don’t mess around at Caltech, but I would say that this paper, in a sense, it’s in the wrong frame. So if I were feeling spicy, it’s what physicists would say, “It’s not even wrong. It’s in the wrong frame,” because I will tell you that the framing of 10 bits per second… I mean, I understand the question they’re getting at. In a sense they’re not wrong because if we were to go and ask, I mean, just scanning the thing here of, “If I give you a binary choice and I ask you a question or whatever, you can only really go through at a certain speed and make some binary decision to pop these things off.” And I would say, “Well, every binary decision is one bit, and if you can only do 10 of them or whatever, then that’s what it is. Or I can do 20 questions in so much time.”

But this is, I would say not even wrong because it is a mischaracterization or it’s using the wrong paradigm to measure information. And this is the part that you’re already speaking to. The construction of the experiment limits how much information can come out of the human, right, ’cause it’s only binary coding A or B choices. But if you were to go and ask someone who is in charge of a large army, and you think about Operation Overlord and the D-Day invasion and you say… I think for those who don’t know, there was sort of a point that’s closest between France and the UK, in England.

Jim: Yeah, Pas-de-Calais, which-

Peter: Pas-de-Calais, which-

Jim: … is where we faked out the German.

Peter: … is where the Germans sort of expected that the invasion would come from there and not the beaches of Normandy. And they staged an entire fake thing to make it look like the Allies did, to make it look like we’re going to invade at Pas-de-Calais. But one bit, if you were to go and actually ask Churchill even, like, “Hey, is the invasion going to happen at Calais? Or if zero is Calais and one is Normandy, give me one bit.” Well, is that only one bit of information or is there a whole stack of information behind that one bit? Right.

Jim: Both. This is the key. Both.

Peter: This is the key, yes.

Jim: Somebody made a decision, and it was probably Eisenhower that we’re going to put a non-trivial investment in building this fake army of rubber tanks.

Peter: Right.

Jim: And they even took as, basically it was a disciplinary measure ’cause he fucked up, they put Patton in charge of it, who was the scariest general, and so the Germans were, “Oh, Patton, he’s running around up here near Dover, and it’s got to be there.” So obviously a huge amount of cascades of small bit operations tree-structured out to produce this phenomena. But at the end of the day, Eisenhower decided yes or no on the fake in Pas-de-Calais.

Peter: Well, so yes, I mean, you’re right. But where I was going to go with this was to sort of harken back to or call back to Gregory Bateson’s definition of information as a difference that makes a difference. So it’s always a subjective measure, and I think this is where I say that the unbearable slowness paper is sort of not even wrong. And I mean that in a very polite sense that the paradigmatic frame it’s choosing to walk through is too limited for the general sort of thing we’re talking about here, right.

Jim: Yes and no. Yes and no. Yes and-

Peter: It is good to ask about the… But hold on a second-

Jim: Okay. Go ahead, go ahead.

Peter: Because the fundamental thing is, when you’re talking about the measure of information, you cannot say that a yes or no from Eisenhower is only one bit of information. If you already have all this context, then yes, it’s only one bit of information. If I take the entirety, and this is actually my argument, remember I asked sometime during this year, we talked about the [inaudible 00:21:57] copyright in LLMs. And so if I were to take a pile of images and I were to train an LLM or train the diffusion model with those images, and then on the other side, I give it the right prompt and it’s able to produce a piece of art, very much in the style of an artist whose images were on the data set, and we were to say, “Well, that’s LLMs, you know, that’s fair use because they’re just training on this data.”

But what’s the difference materially between that and me taking a zip compressor and compressing with a password, all of these different pieces of art, and having a giant tar ball, a giant zip file of all this art, and then simply saying, “Well, the prompt is…” And the prompt is just this password and out comes a piece of art. So the idea of… This is where structure, compression, information, all these things are deeply intertwined in that we get confused, very confused very quickly, if we think of information as a static thing and not as a relationship of information transfer. Sort of, the ability of one thing to be more salient than something else, just like, temperature or heat is a relative measure between two things like that. So anyway-

Peter: Temperature or heat is a relative measure between two things like that. So anyway-

Jim: I’ve got to go one level more meta because this is hugely important and it’s exactly on target. And this is the distinction between, let’s call it Shannon information and Bateson information. Let’s take it back to our invasion of Germany.

Yes, a zillion bits were processed in creating the fake army, but there was only one bit of information that the Germans needed. Was this fake or not? So Shannon, one bit sent across Bateson because the context of what all that meant was something that was built into Germans and their study of war and Clausewitz and gigantic hierarchical stack that senior German officers had to evaluate that one bit. But if they’d gotten the one bit that said, “It’s fake,” a huge other cascade of bits may be gated, but a 10 bits per second propagating outward through humans because that’s all the humans can process. So both are correct, and this is the distinction between information in context and raw information transmission. It’s actually a shame that Shannon’s theory was called information because it really should be communications.

Peter: I don’t think he tried to misrepresent that work at all. I mean for him it was communications.

Jim: The name unfortunately stuck and it confused people. But it actually turns out that it’s kind of right to… So you got to look at both. You got to look at the transmission rate. Anyway, so this was fun. This was classic Jim and Peter meta conversation.

Let’s now go back to the first topic we agreed we want to chat about. And this is one that Peter brought to my attention, though he does not claim to be the author of it, and that is that growing bodies of evidence that different large language models and related transformer-based deep learning implementations, particularly whether a degree with language, even if they use different data sets, different algorithms, different approaches, different numbers of layers, et cetera, seem to be converging at some level on something. And I think Peter has called this the Platonic reality or something that.

Peter: Platonic representation. And again, I’m not at all… I brought this to Jim’s attention and I’m deeply interested in this topic for many of its implications, but there’s literally a paper called the Platonic Representation Hypothesis, which came out, I would say it came out earlier this year, but it’s sort of at the vanguard of a bunch of other work as well that was done in this area to look at how similar are all these models. Many different labs making many different models, training out different kinds of data sets, but how similar are they? And the paper is, I would recommend everyone to look at the paper because it’s really, really interesting. And it basically it makes an argument, I would say, simple to understand, which is that ultimately the more competent a model is, the more general, like a generally intelligent kind of thing you have, the more that they converge and have very aligned representations.

So if you have a model that’s only able to identify cats and another model that’s only able to identify airplanes, they’re not going to have very similar representations. They may know about the kinds of environments cats find themselves in, which is oftentimes a domestic environment or an outdoor environment. Airplanes are going to be cloud tops and mountains and things like that, airports. So the kinds of things those models are quite different but if you have two more general models like something like Claude and GPT or Gemini. These are massive generalized models that understand a lot. They encode a lot of information about the world. But so much of that is all based on the world we live in. So this paper walks through, first a convergence and alignment on vision models, and then it goes into somewhat broader discussion about… Look, any kind of a cross modalities, not just vision, but language models.

Ultimately, if you have what’s called tasks generality, if you have to go and do different sets of tasks, if you’re very narrow on a particular task, you only need to be able to create hypotheses and tests and try things in a very narrow spectrum. But if you want to be general, a general purpose sort of task agent, then you have to basically have something that relaxes into and converges on hitting all of these different kind of nooks and crannies of a very complex optimization surface. And so they all converge the same thing. And there was even a blog post or an essay or something that a Google engineer wrote last year, I think that said, at the end of the day, all of these models are really just representations of the underlying data set. And this seems obvious in retrospect at the time it was a little bit like, “Huh, okay, maybe that’s right.”

But now it’s pretty clear and obvious that the data sets really where it all is at and the models, you can tweak some things here and there, but ultimately the model is a representation of it’s underlying data set. But if you take that logic one step further, the data set, if it’s books written by humans about human life and history and the texture of human life, they’re all going to kind of be talking about the same things. So that’s ultimately in layman’s terms, not talking about inner products and things like that and latent spaces. That ultimately is what The Platonic Representation Hypothesis is, is that all these really, really big models all are converging to having internal representations of… They’re very similar to each other. That’s it in a nutshell.

Jim: Yet another surprising result because we were flunking around in experimentation mode, didn’t have theory, but now we’re developing this theory. But once we get to theory would strike me as not super surprising. And here’s why. It’s funny that they call it Platonic because I would’ve called it Aristotelian, but that’s all right.

Famously, I hate Plato and love Aristotle. I love Plato’s writing, which is way better than Aristotle’s. But anyway, neither here nor there. Presumably the reason that they’re converging in this level, in this way is there’s only one underlying reality. In this case, we’re not talking about physical reality so much. We’re talking about the social reality as constructed by humans in their written artifacts and their visual artifacts and such. So when you step back and say, all right, these tools are ways of playing the language game to represent the reality that is in the data sets, and to your point earlier about cats versus dinosaurs or whatever, as the data sets get more general and come pretty close to being statistical samples of everything about humanity, it would be actually surprising if they didn’t converge.

Peter: This is a little bit of the conceit that we have now just two years out from the chartering moment where we could say, “Well, obviously these things must converge.” Where two years ago we’re like, “What the hell is this thing even do? How does a matrix multiply, generate Shakespeare?”

Jim: What the hell?

Peter: “What the hell?” And now it’s like, “Well, obviously these should be converged.” I’m not trying to call you out and make fun of you. I’m just sort of saying just to kind of check where we’re at in this progression.

Jim: This is moving towards science from from fiddle fucking statistical theory.

Peter: We’re starting to see through the black box and actually that Platonic Representation paper does have a thing. They talk about this, they call it simplicity bias, which is to say that you have different models. Maybe one is a 10 billion parameter model, maybe another is like the 400 billion parameter model. Why wouldn’t you expect that the larger model with 40 times more parameters, why wouldn’t that have a deeper and more textured and possibly way more different representation of dog than if you had a lower number of dimensions? And it’s like, “Oh, well, we have to use a very simplified.” It’s like when a kindergartner draws a dog with a couple of ovals and some legs versus a talented artist painting a beautiful, gorgeous dog running through the field in oils or something on canvas.

Why wouldn’t the 40 or 400 billion parameter model have the equivalent of that kind of a rendering of the concept of dog? Whereas the simpler model has a simpler one. And so it’s not necessarily obvious that these things have to all be aligned, but in the paper they talk about this thing they call simplicity bias, which is if a simple dog-like just a little kindergarten circles of dog with legs and it’s sort of barks at humans, if that simple concept is good enough, generally these models apply Occam’s razor approach to sort of just have that. And if you want to then branch it off into Great Danes versus Pomeranians, then you can branch off from there. But all of them are sort rooted in a general orientation and latent space that is dog-like things that then does line up with a much richer model that also has dog-like things.

So there’s some kind of a, because of the back prop or whether you want to see it as a simulated annealing process that happens, whatever it is, the models that are built all end up sort of like building, I wouldn’t say mapping to an explicit ontology, but the structure of knowledge embedded in there is reflective in a somewhat efficient way. It’s reflective of the ideas that are in the structure of language that we have. The thing that we don’t talk about as much is the idea that maybe human language itself is intelligent, that there is actually intelligence embedded in language. And as we become linguistic processors and we learn language, we feel we become intelligent, but actually we’re just manifesting to some extent the music already in the notes. There’s a thing that people don’t talk about very much, but I think there’s something to be said about all the ways we could encode human experience in lifetimes in a durable way across millennia. We chose different kinds of language systems that have grammar, that have vocabulary. There’s concept of nouns and adjectives and verbs and pronouns and all these things across many languages. And so the idea that language itself is actually the Platonic artifact and is the generator of intelligence for human brains, that’s something I don’t see so much coming out of these kinds of papers, but it’s something I think about.

Jim: But that’s actually interesting. I don’t know if you’ve ever stumbled across the work of Daniel Everett. He’s kind of a renegade linguist and anthropologist, and he takes the very strong anti-Chomskian view that human language is totally socially constructed. We don’t have a single specialized circuit. I suspect he’s probably wrong. I’m more of the Terence Deacon guy who thinks that we have, but not a Chomskian, thinks we have a circuit for symbols and the rest bootstrap from that. But he goes further and claims no special circuits at all. Socially constructed. His idea, even though he doesn’t state it the way you just did, is congruent with what you just said, which is the superpowers of humans comes from our ability to somehow build all this knowledge into our languages that’s useful and hence, and as Darwinian, those who can store bigger pieces of reality in their language can out compete those who don’t.

I mean, one of the most interesting things about human evolution is the very rapid growth in the size of our neocortex between let’s say Homo erectus who was maybe 60% bigger than a chimp and modern humans, which is like five times the size of a chimp. And one could imagine the ability to encode knowledge of the universe into language specifically so that it can be socially shared easily long before we have writing, long before we have theory. It’s implicitly stored in the language. It’s an interesting idea. I would recommend looking into Everett. He’s a very intriguing…

Peter: Okay, yeah, it’s interesting. The two places I would sort of branch off from that is I do think a lot about… I’m a fan of Thomas Metzinger’s work on consciousness and the idea of consciousness emerging from the same circuits that model other creatures in our environment. And so by modeling other humans as independent agents acting in an environment, we have the circuitry for modeling human behavior. And when we turn that circuitry back to the self and predicting our own behavior is then when we build a self model, what he calls the perceptual self model, PSM. And that perceptual self model is ultimately what gives us the illusion of control.

I first encountered some of his ideas, I think it was in a book called The User Illusion. In any case, it was a long time ago, but…

Jim: That was not [inaudible 00:35:36].

Peter: It wasn’t him. That was Torr. What’s his name?

Jim: Yeah. I did a podcast with him on that book and it’s fucken excellent.

Peter: Did you?

Jim: Yes.

Peter: Oh, man, I missed it.

Jim: Almost nobody has ready that book.

Peter: So here is a note to listeners. Listen to more of Jim’s podcast episodes.

No, but that book is ultimately what led me to finding out about Thomas Metzinger’s work. And The Ego Tunnel being no one. These are Metzinger’s work. But the point is that in this idea of language is a social construct and is language itself intelligent to the extent that we can call an inert definable thing like that intelligent, it’s only asking if a piece of sheet music here from Beethoven is that musical? Well, I mean yes and no. By itself doesn’t sound like anything, but with it, any musician can now play music of a form that was clearly a download of Beethoven or Mozart’s genius.

So when it comes to whether or not language is intelligent, we can maybe defer that question. But certainly language is a social construct. I think it co-creates with humans. And I was thinking about the fact that when we model other humans, one of the things that humans do, we have why are our eyeballs white? We have white eyeballs. And I read a thing somewhere, I forget who it was. They were talking about the fact that for coordination and for signaling to other people where our attention was, our eyeball tracking different things, we are sending a very strong, a very efficient and completely silent communication. So in the hunting scheme, that’s pretty useful. You can be totally quiet and you can point to things. Of course you can gesture as well but in terms of helping to aid other people self-model, not self-model but helping other people model the humans around them.

Imagine the humans were all zombified where their eyeballs only stare straight. Or the eerie thing that people do for movies where they put the black contacts in there that black out their eyeballs.

Why is that so eerie to us? It’s because we cannot model the attention of that intelligence anymore. So there’s a very visible response to that. So this idea that our brains are wired for a certain modeling of other intelligences and language emerging from that dance almost. That’s one thing.

And then the second thing was I would say there is a different kind of language that humans speak that is, I think should be contemplated as part of this dialogue. And that is music, we are hardwired to listen to certain kinds of harmonics. We identify notes. Almost everyone can identify an octave, it feels like the same note. There’s this very innate sense that we’re wired for when it comes to the ratios between notes on a scale. Things sound discordant, they sound complex at different sort of really complex fractions. So there is something to this fact that we are hardwired for certain kinds of things, but then of course there is still an explicit and intentional thing we can do on top of that basic wiring to build vocabularies then build language on top of that. So anyway, those are the two things I just wanted to call out relative to this concept of language and intelligence.

Jim: The eyeball thing is extremely interesting because it makes a lot of sense and humans are about the only mammal that has this artifact. A chimp eye is all dark, and it may be tightly linked to language because they’re cooperative hunters. Think of a wolf pack. They are extremely efficient. I just had lunch with a friend of mine yesterday and she actually went out to and saw videos of it, actually went to the site where a pack of wolves had taken down a elk, and then were defending the kill from other predators trying to steal it from them. Amazingly effective, better than humans in terms of coordination. But they don’t have whites in their eyes. They also don’t have language, probably not a coincidence.

Peter: And the structure of how they are. I mean animals that have appendages signal with those appendages. I mean any-

Jim: Probably use tail.

Peter: Tails, ears. Ears forward, ears back.

Jim: Dog, ears. Yeah.

Peter: I mean, dogs, horses, cats, if you ever have a horse’s ears are back at you you get away because it’s going to bite you. So every single thing that could be free to use for signaling creatures do use it, but to have a social hunting sort of thing where you’re sort of dynamically committing a lot of energy, again, back to the cybernetic thing, you’re having to commit a lot of energy into this thing, and we have to make an uda bet sort of together. In those kinds of environments it forces, there’s a Darwinian forcing function that gives you… Every bit of signaling you can do is advantageous.

Jim: Let’s move on to our next topic, but it’s related to this idea of the Platonic representation, which was the extraordinarily fraught and high stakes question about LLMs and copyright.

Let’s talk about it first in general, and then I’d love for you to get into your ideas on what you call ample. But let’s start with the general question.

Peter: Yeah. The general question, I think as people encounter it now is if we have an LLM and it’s able to… We train these elements with large amounts of data. If that data is copyrighted data and the LLM, is the LLM a derived work. So in most countries in the world, people have a human authorship concept. There’s a concept of human authorship. So if a human creates a piece of art, if a human rights a document, they have the right to… They have what’s called copyright. Copyright is actually a statutory thing. It’s created by the government. It says that, look, you as a human being, when you created this thing, this expression of your idea, the right to make copies of that is a right that you can then transfer to someone else. And so no one else can do it unless you’ve given them the permission to do that with the few carve outs.

And those are generally considered fair use. That’s where fair use comes in. But just to get everyone on the same page and to be very, very clear, copyright emerged in the 1600s when the printing presses became cheaper and cheaper. And so more people were able to copy things and the really expensive printing houses that were given sort of a royal charter to mass-produced works, they were starting getting squeezed and authors were getting hosed because people were pirating their works everywhere. So no one was incentivized to go and produce copies and make the investment in all the machinery and all the typeset print and all this other stuff.

So copyright was created as a sort of a statutory thing where every single author just has this right but it’s a right you can trade away. You say, “I’m going to give the copyright over this work to the publishing house,” and then they can make lots of copies, and then we negotiate a business deal. And so they get paid for their investment in equipment and the distribution network, and I get paid as an author for my ideas.

That’s it. That’s basically the core concept of copyright.

The thing I will say about copyright is that it is one of those rights that’s an anti-right, but that’s a mouthful. So no one calls it the copy… It’s an anti-privacy right that people just call it copyright. But it’s an anti-right in that it doesn’t give you permission to do something. It gives you the permission to deny everyone else the ability to do something right. So your right to bear arms, that’s your right to carry a gun. Okay, we can’t take the gun away from you. You have the right or the right to free speech. We can’t mute your speech when you’re speaking about whatever, but copyright is you take rights away from everyone else. And patents are the same thing. Patents and trade secrets and these things what we’re doing is government is, via a statutory mechanism, government is creating a scarce environment.

Jim: An artificial monopoly, essentially.

Peter: An artificial monopoly. And this is not just me philosophizing. This was a hotly contended and debated thing at the time. So it’s important for us now. We live in a world that has all these things are just standard, part of our normal world, the modern world. But it is important to recognize that there was a time when copyright didn’t exist, and this sort of statutory monopoly was created out of whole cloth and given to people.

Now, it was modulated in one important way. It was not perpetual. There was a time limit to it because it was recognized that this was ultimately to motivate an economic incentive for the furthering of the arts. And in fact, if you go and look at the initial establishment of copyright in the US Constitution, when we went and created the country, when the founding fathers created the country, the idea of putting that in there, there was a little preamble phrase in front of it, which is in order to promote the useful arts and sciences, they created patents and copyrights because they recognized there was a limitation that was not rooted in a moral philosophical way. It was an economic incentive.

Now, the interesting thing is something much more recent that is relevant to probably many listeners of this podcast is that software itself came under question. Is software a copyrightable artifact? And this is something that-

Jim: Yeah. Interestingly, I can say when I started my entrepreneurial career in 1982, the common belief was software was not copyrightable.

Peter: Software copyright came into being only in 1976. And the first thing to be copyrighted, funnily enough, it wasn’t a corporation that put software into copyright. It was a couple of Columbia law students who wanted to test and see is software copyrightable. And so the Copyright Act of 1976 established that software is something that’s copyrightable. And that’s kind of an interesting thing because a for loop, why is that copyrightable? It’s math.

Jim: In `82, the general view… There may have been some case earlier, but I absolutely tell you from being a venture funded entrepreneur, nobody asked us to copyright our software. We never put that in our disclosures. In fact, we wrote in our disclosures that we cannot copyright software because they are essentially math. And math has been specifically ruled out as copyright. And it was only later that people probably using this earlier work figured out finagling-

Jim: The people probably using this earlier figured out finagling ways to end up copywriting software for better and for worse. And again, back to your original point, that thing in the Constitution in article one, section eight is extremely peculiar, that it’s so specific about patent and copyright.

It’s at a very different level than most of the things in the Constitution. And it was kind of amazingly radical and prescient by the authors to realize that this one thing was such an important lever in the whole evolution of society because they wouldn’t have put it in there. The US Constitution is amazingly short, and I periodically go through and reread it, and I go, “Jesus Christ, in a country of 2.5 million people,” I think the one author of the Constitution had the most education, had seven years worth of education. “How the hell did these guys come up with this amazingly short document that was so amazingly good in so many ways?” Now, of course, it was mistakes were made.

Peter: It was flawed in many ways, but it was pretty damn good for what they had at that-

Jim: Compared to modern constitutions they’re 150 pages, this suckers like seven pages, something like that, and plain English. But one of the things they chose was copyright and patent, because somebody understood how important that was. Important in a good and bad way, but how significant it was.

Peter: Right, right. And so it was prescient at the time. I mean, in the late 1700s, that’s only 150 years after the Statute of Anne or whatever. It was a fairly new thing at the time. So 150 years or whatever. But we’ve written now, we’ve based an entire information society on those early foundations that were laid down.

Jim: Well, there was a statement, the context, actually a significant industry in the North America in English colonies before the Constitution was pirated copies of European works. Many of them were printed in the early US or the English colonies, and then exported back to Europe and England. So actually, they were undermining one of our local industries, but they had the higher view that the net result would, in theory at least, be good for society.

Peter: Right. No, actually, that’s a good point. And Ben Franklin was a printer’s apprentice, so no, they were well part of these topics for sure at the time. But the reason I bring up the software thing is because I just want to highlight that when we have new kinds of information modalities, we revisit this question. We do have to revisit this question. And software is kind of a weird thing because it can’t just be math. And so to be able to put the expression of a mathematical thing under copyright, what does that really mean? Because a for loop, or obviously if I just write a Python program, that’s just a for loop, that’s not very interesting. But nonetheless, I do have copyright over that source code technically, whether it stands up in the court of law, who knows.

But the point is that ultimately, this idea of an artificial monopoly over the expression of ideas was motivated by economic motivation. It was driven by economic motivation. It’s been around for a long time, but it’s also, it’s malleable and it has to evolve. We, famously here in the United States, constantly extended right the copyright to where it’s almost in perpetuity now.

Jim: God-damn, Disney perverted the Constitution. God, I’m just going to do a sidebar here because one of the things that totally me off is that the Supreme Court allowed the after the fact extension of copyright, even though that’s clearly in violation of the preamble.

Peter: Clearly in violation. But actually maybe, I can make a devil’s advocate argument, which to say that if you want to promote the useful arts, possibly the amount of investment it takes to produce an artistic artifact, the amount of investment is so large now that to recoup that investment takes longer than the artist’s lifetime. That might be the only sort of intellectual argument I can make. Right?

Jim: That’s a separate argument. I was actually making the even more profound argument that they made it retroactive.

Peter: They did make it retroactive.

Jim: Which is clearly in violation of the Constitution. And of course, it was lobbying God knows what kind of boxish-

Peter: Sonny Bono, right? So that was Sonny Bono Copyright Extension.

Jim: Yeah, Disney.

Peter: Now it’s coming up again, I think. But the point is that again, the whole idea of what I want to really… Because I’m about to get into something quite radical, and I want to really just really level set everyone to understand that copyright is something that is created by the government. It’s a statutory monopoly, and it is there to promote, in theory, to promote the useful arts for living and dead people, let’s say. Right? You can go and incentivize dead people somehow, I guess. But that’s ultimately the point of it. So when internet distribution came around and digital rights and digital rights management, DRM software, there’s a lot of battles around this in the early 2000s. And what ended up happening was that ultimately, I think the courts and the regulators just decided that a digital copy is not that different than a physical copy.

And we can put mechanisms in place to make sure that this stuff doesn’t just explode and everyone just pirates everything. Then we can still attach similar kinds of rights to digital copies. And so now of course, you have streaming rights showing up in the union contracts for Hollywood and all these kinds of things. So anyway, all that being said, my big insight from last year I think is that copyright and fair use are not going to be sufficient to navigate the technical and economic landscape of what we’re faced with with LLMs. That’s the TLDR. After all of this kind of background, the real TLDR is that the tools of copyright and fair use, the exemptions to copyright, those things are insufficient because what we have with LLMs, they’re not mere reproducers. So if an LLM, all it did was take a pile of everyone’s art and then just spit out new art, then it’s just the lookup table. It’s like I said earlier in the conversation, it’s a zip file with everyone-

Jim: Yeah, encryption table.

Peter: … whatever. Like with encryption keys. It’s not really… That’s obviously in violation. You just literally just pirated entire catalog of things. But what the LLMs do, they’re not copiers, they’re not cameras. They’re not recording machines or tape recorders. They are uncopiers and they are unprinters. And that’s very much a weird thing. And if you read, John Perry Barlow has an excellent essay-

Jim: Very famous, very important essay

Peter: … called Selling Wine Without Bottles: The Economy of Mind on the Global Net. And this was written, God, in 2000 or something. It was written a long time ago. But the point of it is he described the difference between essence and expression. And those are terms of art that are used, of course, in the intellectual property space as well. But essence is the idea itself. Expression is how that idea is manifest in physical form of some type, a recording of a performance or a picture of something or a painting that you do. Or when you write a letter or a novel or an essay, you are taking your essence and you’re expressing it. And the expression, the expressed artifact is the thing that then we put copyright protections around, that we transact. And that’s all great. When the copying was the act, it was the place where the economic value started getting delivered.

And you sort of refer to that as the bottle. You don’t generally buy a bottle of wine for the bottle, you buy it for the wine inside, the essence inside the expression is actually what you want, but all we can do is protect and transact in the bottles themselves. So using that metaphor, what we have now with LLMs is we have an essence extractor, a very potent essence extractor. Meaning, I can go to any bottle of wine of whatever vintage, it could be a $10,000 bottle of wine, and I can hold out a Star Wars or Star Trek, tricorder, teleporter, whatever thing. I can just suck the wine out of there, leaving the original wine in there and fill my glass.

And now that, what does that mean from an economic incentive perspective? What does that mean for copyright? Another metaphor I like to use is when, if you think about it, any of you who’ve read the book Flatland or familiar with the concept of flatland and this idea that you had creatures living in two dimensions, but then this three-dimensional object shows up in the world and starts intersecting the two dimensions.

Well, copyright fair use to me are sort of these traditional copying based expression governance-based mechanisms are all living in this flatland, this two-dimensional plane, but LLMs actually are transcendent technology. They’re able to now actually reach into the higher dimensions of pure idea space, essence extractors and essence transmuters and essence re-expressors. So if you put governance boundaries and laws and legal frameworks in that two-dimensional plane, well, this third dimensional essence extractor just lifts right out of it and just moves, teleports the value out of your fence.

Jim: I love that. That’s really a nice analogy. And of course, there was a reason I put the topics in the order I did. This idea of platonic extraction or Aristotelian extract, whatever it is, where something is being extracted from the world, is this indeed this third dimension? For instance, I can go check-

Peter: Three and beyond. It’s not just… Yeah, three and beyond.

Jim: It’s huge. It’s gigantic.

Peter: Its huge.

Jim: I did actually look into this for a potential software company, and I came to the conclusion that literally as written, current copyright law does not extract, does not cover the extraction of these things. As you point out, someone living in a two-dimensional world would never have thought about such a thing.

Peter: You wouldn’t think it’s possible. Right.

Jim: There was no ability even five years ago, well, maybe in a lab, but outside of the most extreme lab at MIT or something, to say, “Generate a picture of myself in the style of Ramir.” Now, my twenty-dollar a month ChatCPT does not a bad job of that actually. Scares me. And so we have this entirely new affordance around ideas that didn’t exist before. And so that society, I’ll put words in your mouth, you can spit them back out if they don’t fit. That society needs to make some decision about what to protect and what not to protect or protect nothing, or to protect everything or what the hell this new dimensionality even means. And that if we don’t do anything, it basically means no protection probably, though there’s a lawsuit, still haven’t landed yet. And then there’s now a very high dimensional possibility space of how these things are incorporated into the economy and the legal frameworks.

Peter: And I think that that’s the whole thing here, is that many of the people who are fighting about, “Oh, everything’s copyrighted and you cannot train on my copyright materials.” Train. And then others who say, “Well, no, it’s all fair use because what we’re generating here is we’re generating, we’re creating this object, this weight vector, whatever, all these weights, that by itself is not infringing. You can use this to maybe generate other kinds of things, but it’s just scanning and reading this stuff, it’s no different than what a human artist would do.

And that’s obviously something that human artists can do or human author can do, is look at other texts. And fair use exemptions are actually fairly specific. You have to demonstrate… I mean, this all gets very litigious. Copyright is very litigious, and I think it was you who pointed out to me that software, even though it’s copyrighted, most people don’t use software under copyright. They don’t transact it under the provisions of copyright. Rather, they transacted it under bespoke licenses, which is why every piece of software-

Jim: Contract, contract law.

Peter: Contract, it’s contract law actually, that is what facilitates most software transactions. So every single user of any software, they get a license TXT or they click through a EULA, end user license-

Jim: Oh, no, they always read the whole EULA of course.

Peter: Of course they’d read the whole EULA. But the point is, you’re getting access to software or you’re receiving software under a license.

Jim: Under contract.

Peter: You never just go and download. You don’t go to walk into Redmond and walk into Microsoft headquarters, grab a coffee of Windows and say, “I know it’s copyrighted, but under copyright law, I can use this and…” No, no, no, sorry to get it, you have to actually go and purchase it and take a license. So the interesting thing here is that the insight I had was, okay, if what we have is the new kind of technology into this third and beyond dimension that is essence extraction. And thus far in almost any country, the only way we protect essences is with these very, very limited monopoly tools of trade secrets, patents, some government secrets and things like that. There’s not a lot there that it’s not statutory. You can’t just have a thought and say, yep, that thought is now protected. No one else gets to have that thought.

That’s insane. We can never have a society around this. But how do we then govern this essence? How do we actually create an economic framework around this? And so I have been working for about a year and a half on building out the ideas behind what I’m calling, for a lack of a better term, AMPLs, an Anaconda ML public license. I’ve talked about this a little bit at some decentralized web and nerd core events, but the idea behind the license is actually pretty straightforward. Well, straightforward in my mind. The idea is that I’m going to posit that there’s a new kind of right, and we’ll call it AI rights, and it is beyond copyright in the sense that copyright can still apply to whatever materials you write. But AI rights are a way for you to stipulate as a creator what you would like done with your work, how you would like your work treated when it is then used for AI, for AI training purposes, for LLM, deep learning, whatever, have precise definitions of these kinds of things.

But the general idea is that it’s rooted in what we call the moral rights of man. So it’s not rooted in copyright. It’s rooted in a more fundamental thing, which is, “I’m a creator. I assert, I’m a human creator of this work. And whether you use an LLM to train the copyright or whatever else, I would like my work…” It’s a preference signaling thing. “I would like my work treated in the following way that if you were to go and build an LLM out of this, I make my work freely available for LLM training, if your LLM is being made available to educators and researchers free of charge. If you go and you train, if your LLM is then used in a commercial product, then at some threshold level, I would like to be compensated,” or something like this. So this is the basic concept.

It shares some overlap with things like creative commons, GPL and various kinds of open source dual license kinds of approaches. And it does have a viral aspect to it, which is that if you use my data in your LLM, and there’s many different stipulations I can, or I may or may not want on it, but if you build an LLM and then includes my data, your LLM as a derived artifact, we’ll need to at least provide the following freedoms and capabilities to its users. And so it’s very much in this kind of created commons sort of thematic style.

Jim: I understand exactly what you’re saying, but it’s a bit abstract potentially for the audience. So I just came up with, I think maybe a good example, that lands it specifically. Okay, let’s say I write a new medium essay and that your AMPL license thing exists, and I stick on my medium essay a link to the assertion of a specific right. And this is a new brand new article that has never been downloaded by any bots. I at least can arguably say that because I put the link to AMPL 3.2, no one may use this in their training unless they agree to let it be used free for educational research purposes, which is the right, I assert, let’s say. But it could be another right, which is that I need to be paid based on algorithm X, which is another AMPL license.

Peter: It’s another piece you can snap in. It is license family.

Jim: Just like Creative Commons is, just like MIT and Apache, they’re families of licenses. And so this is an important distinction that this is a new never been sucked down thing at the time I assert the license. Now, unfortunately, this opens a big fraught problem about all the shit that’s already been processed. Do we want to give a forward-looking advantage to people who’ve already built models and in some way seal off new production in a way that new entrants into the marketplace don’t have access to the older stuff? I don’t know. There’s all kinds of fraught problems.

Peter: So from a strategic perspective, the way I see it is that all of the stuff that’s already out there and doesn’t have a license explicit declaration on it, people who use models trained on that data are basically rolling the dice. They’re sort of YOLO. And we’ll see what happens in the various courts of law. And I say courts plural, because every jurisdiction may be different.

Jim: Every country is different.

Peter: Every country is different. And oddly enough, well, there’s a game theoretic aspect to this, which is that any country that permits this is going to… It’ll be interesting, because I don’t know how they’re going to change the burn convention or whatever, but there’s a whole issue here, which is that the entire models can fit on a flash drive. You don’t have to ship, freight, whatever, a Panamax style like freight container full of GPUs. You can go and train in some country where this is legal, legal, and then you take that USB stick and you give to somebody else and you’re like, “I don’t know where those numbers came from. I just-”

Jim: Its just numbers.

Peter: “… They came to me in a dream. I had a fever dream and I put all these numbers and weights into this giant NumPy array and lo and behold, it generates a picture of the Mickey Mouse. Who knew?”

So that kind of thing is where these jurisdictions, as they’re ruling on this stuff, there’s going to be a bit of a race and a bit of a game theoretic sort of weird dynamic there that people already are doing jurisdiction shopping on whose irrigating data, who’s training and where? And so it’s going to get weird. So anyone who basically is taking data sets that don’t have an explicit sort of rights declaration like this, they are basically saying, “We’re going to let the courts decide. We’re going to, of course, file an amicus brief or come down in one way or the other and try to sway legislation in our favor. But people on both sides are very well capitalized. The rights holders are people like all the recording studios and all the publishers of all the books and everything. And then you’ve got, on the other hand, you have big tech and all the AI companies.

So it’s a battle of titans, its clash of titans there. But if you want to do what I would consider sort of a fair trade style, ethical style approach to this, you would like to probably train on data where you know the authors are okay with it, and you know what you can and can’t do with that data. Most authors, it seems like, are fine with their stuff being used for educational purposes, research purposes. They’re not okay in general with their work being used to produce economically competitive artifacts to their own careers. And so, there is some aspect of this, which is the thought experiment here is, can we build something that allows authors and creatives to still have some kind of livelihood? And that includes, by the way, lest the geeks listening to this are like, “Yeah, well, whoever. They work Hollywood, or they make movies and make music.”

Well, it’s actually, this applies to software too because every single licensed piece… Sorry, every single open source software has license on it, but the authors of those artifacts have not given explicit permission for generative code based on their thing. So there’s an assumption and presumption that it’s okay with open source stuff, but a lot of people in tech are very much anti-AI hegemony. So this is the kind of thing where it’s a lot better for everyone if we gave people a tool to just signal what they want done. That’s certainly much better for enterprises who want to build chatbots and build internal rag lookup systems and all these things, they would like to use models that they know they’re not going to get sued for. And so this is kind of where, this is all what the motivation for this is for me.

Jim: Where does it stand? If on my next medium essay and I want to slap an ample license on it, can I do that? Or is there anything equivalent to that today?

Peter: So I’m starting some of the work, part of what I’ve been doing over the last year is really evaluating, is this a viable thing to do? Is this meaningful? Does it matter at all? All these other kinds of things. And in talking to many different kinds of people, including lawyers, people from USPTO, people who train models at the Frontier Labs, I’ve talked to lots of people. My general view on this is that, yes, this is a good thing to do. And in the beginning of next year, I’m going to be starting to convene some workshops and working with people to actually get the license, a first draft of license put together. And where this has come to a head actually a little bit, is on social media data. And we had this blow up recently on Blue Sky where someone posted a million blue sky posts as a data set on hugging face, and people got super mad and flame the guy, and he had to take it down.

But at the end of the day, these are public posts. They’re no different than if someone posts on Reddit and someone scans or scrapes the Reddit data. And so people, whether in the 2-D flatland of copyright and all this other stuff, whether or not you can is I think a different question than how the actual humans who are writing these things or making these things and participating in this information network, that’s different than what they would like to see. And so I think we need to have the sooner rather than later if we’re going to have a successful open social network like Blue Sky is, we have to address these in a way that feels right and make sure the technology is in integrity with the users. I don’t think we can avoid having to answer this question.

Jim: Well, I like that idea. You could actually use Blue Sky as an early bootstrapper?

Peter: Oh, that’s absolutely my idea.

Jim: Yeah.

Peter: You see all the pieces coming together upon the clay.

Jim: Yeah, is this just fake or is this the real invasion?

Peter: You never know.

Jim: Yeah, have a post option, I want to use AMPL one, AMPL two, or AMPL three or no AMPL at all.

Peter: Exactly. Exactly. There’s no reason why in your Blue Sky profile, you shouldn’t be able to flag that as part of your profile. How do I want my posts and data treated?

Jim: Let me give you an analogous example from the so-called real world. The Well, I’ve been a member of The Well now for 35 years.

Peter: You called the well the real world?

Jim: Oh yeah. I mean, the rest of the shit is just a hallucination or a simulation. The actual real world is only The Well. But anyway, from the very beginning, well has had a peculiar content policy called Yo-Yo.

Jim: … Your content policy called YOYOW, You Own Your Own Words, which means, and it’s explicit in the agreement, rigorously enforced if there’s any infraction of the moral code on the well, violation of YOYOW is the worst. You will be thrown off if you violate YOYOW, which is, you may not quote anything on the well without the explicit permission of the author. Period. Even a sentence. Right? So it’s not private, right? In that anyone with 15 bucks a month can become a member of the well and read your shit and you better not assume it’s private. On the other hand, you can quite strongly assume that some controversial statement you made is not going to end up on the front page of the Washington Post. In this weird little world of the well, that is part of the operating system. So adding this to Blue Sky is analogous to that.

Peter: I think we’re conflating two different things because I think what you’re describing is a digital form of Chatham House Rules. Right? It’s part of the social architecture and the social norms and the social contract.

Jim: It’s actually in the written user agreements as well.

Peter: Sure, that’s fine. But ultimately, to what end is this? This is not so people can then sell their posts from the well. It’s not an economic incentive. It is part of the social norms and behavioral norms of that online community, but it does sit on a really odd and uncomfortable thing, which I actually disagree with, I guess, or maybe I’m not in full agreement with, which is this concept of own. And I think that is entirely, that is the question right now, is what does it mean to own the expression of an essence? What does it mean to transact in that ownership and do we and can we, and must we separate the different concepts of property?

So if you go and look up, Wikipedia has a lovely treatment of intellectual property and all the space around it. It’s a fascinating topic, which I’m sure is just dry as bones for some people, but it’s very interesting to me because it quickly… You can separate out, if you really just think about it, the nature of property itself. What does it mean to own something? Well, it means you have the right to use it and derive benefit from its use. Typically, it’s called usufruct, U-S-U-R-F-U-C-T… No, U-S-U-F-R-U-C-T, usufruct, use of fruits. Right? And then there’s the right to destroy it.

There’s a few of these other kinds of things, the right to say who else can derive benefit from it? Which delegated from usufruct, but with information items, with information artifacts, and this goes back to John Perry Barlow and the wine and bottle stuff, they’re infinite. Actually, this is Jeffersonian expression that when I light your candle with mine, my flame doesn’t diminish. And so to say someone owns a word, that’s a big statement, actually. And so at a paradigmatic level of, I disagree with that as a concept, then the noble expression of the well, which is that people can speak their mind freely and engage in Chatham House Rules online, noble expression. But it’s rooted in, I think a somewhat broken construct, which is asserting a blanket right to deny other people the use of words. And even in our copyright, in copyright in the United States, and pretty much… Well, actually, no. Some other places don’t have the concept of fair use, but in United States we have fair use. And fair use is for parody, for quoting, for other kinds of purposes, you can use parts of copyrighted material.

So in any case, all this is just to say that when we come to something like Blue Sky, we have to be very clear about is the purpose of putting something like this in place to establish norms for good behavior on a social network? Or is the purpose of this to give human creators and authors agency and a way to have a say in the economic conversation on the fruit of their intellectual labor, so to speak? Right?

Jim: Yes, although I would say that the distinction may be less black and white than you think. Right? Because I would suggest that the YOYOW rule is something like moral authorship rule, like in France, for instance, the author is considered to have moral ownership of their work and even if they’ve given their copyright away, they still have the right to forbid, say a song being used to promote a Nazi party or something like that.

Peter: Right.

Jim: Moral rights. And so I would suggest YOYOW is in that category. And of course, to your earlier distinction, YOYOW does not claim to be copyright law because clearly, I have fair use rights to quote two sentences from a post in the middle of an essay about something else, but YOYOW is a strictly contractual right. And the only sanction against you is you get kicked off the well, basically,

Peter: Right. But because that’s the only reality, that’s like when you die in the well, you die in real life or something. Right? But actually, this touches then on… The other thing about this that when I got good feedback from people about the ample concept here is also that it’s not just copyright. There are many other things that LLMs and the LLM technology, the uncopier, the essence extractor technology that we have now, there’s many other things that it starts impinging upon and raising really deep questions. Deepfakes and obviously deepfake porn is a very controversial thing, or I guess it’s not that controversial. Most people are against deepfake porn, but the idea of using deepfakes to generate synthetic versions of human likeness, now you’re not hitting copyright barriers. You’re hitting and you’re just steamrolling over likeness rights and right of publicity and unjust enrichment and all these kinds of things which are not actually statutory at the federal level in the United States. They’re in states. So it’s state by state basis, there’s some differences here.

And then there are other industries which are threatened in a different way. So for instance, someone who was in voice acting and didn’t you have someone on your podcast at some point who was a voice actor? Or maybe I remember this from someone else, but-

Jim: No, you’re an LLM. You’re hallucinating, dude.

Peter: I’m hallucinating now, right. But this person, I was listening to them, they were at a workshop, I guess maybe, and they were talking about when they sign away their rights to their work, voice actors are not actually… They’re part of SAG-Afro, but they’re not union. So when they do voice acting for a company, like if they do the voiceover for an insurance ad or some healthcare thing, they sign away the rights to work, but have they signed away their biometric rights? If you are a voice actor and you do the voiceover for a lot of these things, that company who bought your voice for that, can they train LLM on your voice and do a 90% knock-off?

Jim: [inaudible 01:15:57] Yeah. What was the thing about Microsoft? Some big company had a voice for one of their AI thingies that sounded a whole lot like one of the actors, well, female actors. And she sued and they took it down. Or I don’t know if she sued. She didn’t sue. She raised a Twitter shitstorm, as I recall.

Peter: She did. Right. And that was-

Jim: Scarlett Johansson.

Peter: Scarlett Johansson. Yeah, Scarlett Johansson, but the interesting thing is that the ChatGPT-4o release, the voice they used in that apparently sounded a lot like the Vice President of the Voice Actor Guild. And so if you’re in the voice acting business, you learn to recognize your colleagues by their voices. And so apparently after the ChatGPT-4o release, she started getting calls from friends saying, “Is that your voice on there? Did you do this with them?” And so this is the thing that if I’m a voice actor, does every single gig I get essentially reduce my future ability to get work? And if that’s actually what we’re going to sign up for as a society, as an economy, what other things are we giving up? Because again, for the most part, some of these things, it’s Hollywood, it’s actors, creatives, artists, songwriters that have been raising the biggest stink about this.

But when GPT coding gets good enough, you’re going to see a lot of very well-compensated Silicon Valley software engineers start to realize, “Oh crap, this is the other end of the stick of consequences.” And so it won’t necessarily all be just generated code like, “Oh, they fired all of their software engineers and just had all of it be just AI or LLMs.” It’s going to be two or three super senior, really, really good architects and senior engineers running a whole group of these kinds of things. And they may go to hire junior talent offshore to compliment some of this work. And so essentially, it may not be a complete removal of humans and software engineering, but it will be a massive compression on salaries for software engineers and for DevOps people and for people who are good at slinging YAML scripts to orchestrate infrastructure. Anyone who is in information work, this is something that’s going to hit your industry.

Jim: Learn welding, dude, learn welding, right? It’s interesting you mentioned this because I fairly often give advice to young folks, people ask me my opinion about this and this. One of the things I now have officially put into the Ruttian advice machine is, unless you are a godlike person, do not spend huge amounts of your time optimizing your coding skills. Say in 1995 when I was hiring software engineers and coders at a very high rate, man, if you were better than the other dude, man, I’d hire you in a heartbeat. I’d divide them up to A, Bs and Cs and even A+. And if you were an A+, I’d hire you on the fucking spot and pay you almost anything you wanted, but maybe the A pluses still have a big economic leash, but the A minuses and the Bs, I never knowingly hired a C. That was one of my rules. Their economic value may be a lot less.

And the other one I’ve also, I’ll give myself credit for being maybe one of the first people to point this out, is that I’ve been predicting, and I actually did it myself on one of my projects, is I believe there’ll be a turn towards liberal arts people for doing prompt engineering because the language of language is language, and who is good with language, but literature majors, writing majors, and above all else, analytical philosophers, right? And so if I were doing an LLM intensive startup project, I would have one brilliant architect, a couple of Python hackers and two or three philosophers who I got from Starbucks when I went in there one day. I said, “Dude, you look like a philosopher with that little goatee of yours and your Bulgarian knockoff tweed jacket. How does that become a prompt engineer at twice what they’re paying you at Starbucks?” Right?

Peter: It’s a good thing that HR is in the room with us right now, Jim, but I agree with the gist of what you’re saying, which is that the ability to use language and express yourself clearly in human language, and to have analytical thinking and systems thinking, the combination of those things is going to be the most powerful thing. But this is where I think it’s not just if you’re good with words. It’s actually, I think in my experience with these tools, what makes me as a senior software engineer kind of background, what makes me most powerful is that I can already think about what are the areas where I can direct this somewhat limited technology, amazing, but limited technology, what are the areas I can meaningfully direct it to build something that I can then go and check and then do the next thing and do the next thing? So knowing where to put the cut lines of modularity, there’s a pragmatism wisdom that comes from many, many stars-

Jim: For now, for now, prediction, come back in three years, and we’re already seeing these agent frameworks for coding, right?

Peter: Yes. And they’re very, very good. They’re very good even now. It is going to get shockingly good, especially I think in three years time, the other side of it is going to change. The run times, what is a language?

Jim: Thanks, by the way, for a wonderful explication of this ample thing, and this is significant. This could be a forcing function to force the world to deal with the fact that we no longer live in flat land, basically, right? And what the end results are, who knows?

Peter: We’ll see.

Jim: But I’m going to argue it has to be statutory. That case law cannot project from two dimensions to end dimensions. So there’s got to be statutes, which unfortunately means bet on the intellectual property lobby. Those motherfuckers, they’re the strongest lobby on Capitol Hill, stronger than Israel, and they don’t leave any footprints. So just some thoughts. So the last thing, this is something you and I have talked about on and off for a couple of years since Kitty Hawk, is the distinction between the giant frontier models and specialty models. And I happen to have some personal hands-on experience in the last month or so with one of those trade-offs, which is being a half-ass programmer, not a Peter Wang level software dude, but someone who likes to write software for cool shit that I like to do. And I only do it from time to time, so my fingers don’t remember all the weird shit about language syntaxes for… I’ve been doing it for a couple of months. It definitely depends heavily on AI to help me write software. It makes me at least three times faster, maybe five times faster.

Anyway, I read about a new model called Qwen2.5 Instruct, that was supposedly a really state-of-the-art model for doing coding and 32 gigabytes of parameter. So pretty small. It would fit on a big piece-

Peter: Wait, 32,000,000,000 parameters or 32 gigabytes quantized?

Jim: 32,000,000,000-

Peter: Is a gigabyte… Oh, okay.

Jim: 32,000,000,000 parameters, I believe. So 32B model, as they call it. That’s parameters, 32B model. And so I tried it out on my three tests coding projects, and it was every bit as good as GPT-4o. Every bit is good. Maybe slightly better on the more prosaic stuff, like create a website using flask to do XY, & Z. Right? Very good, but then I tried it on my first probe question on general knowledge, which is who is Jim Rutt? It turns out a perfect probe question because I’m pretty fucking obscure in the world at large, right? You go to your Kroger shop parking lot and stop 100 people, ask them, “Who’s Jim Rutt?” Not a single person will have a fucking clue. But nonetheless, there are hundreds of thousands of people in the world who probably know who I am.

So I use myself as a very good probe. Things like ChatGPT-4o, these days, it’s about 99% correct when it answers. The Qwen was a hilarious total fabrication. Books I not written, speeches I hadn’t given, it was the old school. None of them were implausible. Right? They’re the kind of books I might’ve written and the kind of speeches I might’ve given and the companies I might’ve started, but they were just fucking hallucinations. So essentially, could be really excellent in one domain and just complete horse shit in another. Just a quick statement, but the thing we’ve been talking about is that particularly in business, there’s a lot of people who are skeptical about pushing their data out to OpenAI. Who the hell knows what happens there? And the idea of cheaper, faster on-premises even models, like for instance, Quinn, you can run on a big PC as I understand. So give me your thoughts about what this dynamic competition is between gigantic frontier models in the trillion parameter range and amazingly capable specialty models like Qwen?

Peter: I think it seems to me like for many of the kinds of purposes that we have been putting LLMs to, the smaller models are showing themselves to be pretty much good enough. Now, that being said, the general architecture of transformers and the stochastic parrot, guess the next token nature of them means that we still end up with a lot of these things like the hallucinations you’re talking about and other kinds of things where we have to engineer a lot of guardrails around them. So evaluations are very important. It’s possible there are new architectures that can come out that are much more bounded by knowledge graphs and epistemological frameworks or ontological frameworks where hallucinations go away. That’s actually very much a possibility and people are working on that. But for the current models we have right now, yes, hallucinations are kind of a problem, and the only way to save yourself with that is to have very robust and rigorous evaluation framework.

Even if you use large frontier models that are hosted behind the API, you still need to do that. The interesting thing about these large models is that people can use them to generate synthetic data, to generate these evals, to do all this other kinds of stuff. So it may be that the value of those large ones is to actually hone and make the small ones much better, and that’s the only time you’d really use those and you’d use the small ones and hone them down to be fit for purpose, especially with agent frameworks, especially with chain of thought kinds of things. The big thing that’s changing the economics of all this is inference time scaling. Fancy word for just saying, “Let the thing think longer and actually explicitly have it figure out how confident is in its…” And we’ve talked about this before, having an explicit epistemic program that it runs to determine what it thinks might really be real or not.

If you couple that with tool use, if you couple that with other kinds of things where it can gut check certain things against reality, like an actual calculator, an actual Python prompt, an actual web search, whatever else, you put up some of those things together and you can see your way to having a small collection, call it 10 raccoons in a trench coat, and that basically starts looking good enough. So that’s where this goes, I think in the near term.

And to your point about companies that don’t want to upload their data to whoever, I think that’s true, but it’s also a little bit moot because you can go and buy a private GPT from Microsoft, which is the GPT model that OpenAI uses to power ChatGPT. You can run that privately for yourself on a private Azure instance, and no one will ever see your data. So that’s not different than running an open weights model like a Llama or Qwen or Mistral or whatever. The difference is, the difference is that, and I don’t hear people talking about this too, too much, but I see coming from open source software, this is a real dynamic. If you’re a business and you don’t have just a little proof of concept, but you have a real serious business thing you’re trying to do, what matters for you is reproducibility. What matters for you is regulatory compliance. What matters for you is a lot of these other things that not just govern, but in fact define what’s possible in a really scaled up publicly traded, scaled up business.

Like if you’re using these AI LLM things to read applications, and then you’re going to evaluate applications for loans. Now, granted, most of them are typed up, but some might be handwritten. God forbid, your model starts training off of certain kinds of, certain styles of handwriting or certain inflections of grammar. Is this African-American English versus Harvard English? These are the kinds of things where your model might land you in hot water with regulators and what you’re going to need is you’re going to need the ability to go back in time and say, “Nope, at this point in time, we use this model this way, and you know what? We made a decision on this loan or this mortgage application on the basis of these factors, and we now five years down the road, we can use more advanced tools to interrogate this model to determine its bias.” And we say, “Yeah, actually, it wasn’t doing anything untoward there,” or the opposite might happen. Right?

So the ability to actually control your own destiny when it comes to the information systems you build around LLMs, that is for me, the single biggest argument. Because what you see all the time is people saying, “Hey, they update the model. Oh, they change something. They tweak the system prompt…”

Jim: Every fucking day, they do it. They change it every day.

Peter: Every single day, it changes, right? And that to me just underscores the fact that this stuff is not really prime time yet for really serious applications because as soon as you get to really serious applications, people who don’t give a rip about technology are breathing down your necks saying, “Show me why that happened. Give me the level of confidence.” If I’m on an investor call saying, “Yeah, no, actually, we had to pay out a billion-dollar settlement in the class action lawsuit because it turns out we were rejecting female candidates because our model picked up this particular weird thing in their reflection on the voice calls or something.” God forbid, you do something like that. Anyway, this is a long spiel, but this is why you want your own models. This is why you want small models, why you want on-prem models that you can then put your own evaluations on.

And if I end on one final note, I would say that to me right now in this verse version, this first era of AI post-chattening. It’s been all about models. Now it’s evolving to being about data. I think as we evolve all this towards cybernetic systems, it’s going to become much more about data and then the evaluation frameworks. The eval frameworks are actually going to be the thing that separates this first phase of a lot of POC, like AI proof of concept work to actual robust enterprise grade deployed cybernetic systems. It’s going to be the eval frameworks, so I’m just going to call it.

Jim: All right, that makes a lot of sense to me. All right, Peter, this has been hilariously good.

Peter: It’s been a lot of fun.

Jim: It’s been a wonderful conversation. Thank you very much for coming on the Jim Rutt show and showing your wit and wisdom about what the hell’s going on in the world.

Peter: Thank you so much for having me.