Transcript of EP 300 – Daniel Rodriguez on AI-Assisted Software Development

The following is a rough transcript which has not been revised by The Jim Rutt Show or Daniel Rodriguez. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is Daniel Rodriguez. Daniel’s a chief architect, software architect that is, not a building architect at our potential. Prior to that, he was an engineering manager at Anaconda, the Python data sciences company, and he was a software engineer at Microsoft. Before that, he had a varied history of creating online technology and stuff. So he’s been doing software for a while. Welcome, Daniel.

Daniel: Hello, Jim.

Jim: Hey, good to chat with you again. Daniel and I talk quite regularly about all kinds of crazy stuff. And one of the topics we talk about is AI software development using these transformer-based LLM-based technologies and suites of technologies and related technologies to write software. And this is really something that I’m passionate about. I’ve been fooling with it since almost a week after ChatGPT came out, and I’ve even done a couple of good-sized projects. In fact, most recent one, “Untouched by Human Hands” – I haven’t even looked at a line of code. It’s quite amazing. Recently, we had Adam Levine on the show, Jim Rutt Show episode 289, where we talked about AI-aided software development for non-techies. Adam is basically a journalist and thinker and such, but man, he’s been creating some amazing software, even though he knows nothing about what the hell he’s doing. Right? It’s amazing. Daniel, on the other hand, is the other extreme, very thoughtful, very professional, very experienced software engineer who has been thinking hard about how to introduce large language model based cogeneration into a professional environment. So that’s what we’re gonna talk about. Welcome, Daniel.

Daniel: These days, people are talking about naive coders, and it makes me think that the naive coders have always been the professionals without a PhD in the industry. Like me, just hacking things around until we get professional enough that we get, you know, respect and are able to work in so many different projects. So my challenge to the engineers in the industry is that, you know, from the beginning of software engineering, there has been this debate of whether software engineering belongs to the hard sciences or the soft sciences. What do you believe?

Jim: I’ve always believed it was mostly soft science, mostly the science of management, human factors, etcetera. But there’s a tiny bit basically in things like compiler design, parse tree writing, optimization of database internals, which used to be one of my fields, that could be called, if not science, at least engineering.

Daniel: Yeah. Those hard parts, they resemble so much to the math that it’s almost mimicking the math directly. But yeah, I agree with you. Like, the software in itself is a – it’s an exercise of language and culture that, you know, is so dynamic. I think it naturally belongs to the soft sciences, quote unquote, and especially these days. Right? We are seeing that wall be broken down and people be able to talk more in programming language.

Jim: You mentioned vibe programming. I must say that’s a word I hate. Right? For the audience who haven’t fooled with this stuff, there’s all kinds of ways you can use AI. So when I hear the word vibe programming, I think of using one of these crazed agents where you just type in what you want and just let it do it. Right? And I’ve done that a few times just to see what will happen. And it reminds me of, like, a fairly talented 15-year-old programmer with six months of experience who took two Adderall pills instead of one. It just goes, well, errors go flying out. It fixes the errors. It creates more errors. But it’s hilarious. Sometimes it usually actually works for something small, like a 300-line website or a couple of functions or something. But I just hate the terminology because when I use AI coding, I do it very much like I was writing and engineering a normal solution, but I just let the AI write the code. Right? So I don’t have to actually deal with writing code, I know what the ideas are and how it’s structured, know what the parts are. I know more or less what the UI looks like. We’ll say the LLMs are much better at fonts and colors and shapes of boxes and stuff like that than I am, who famously has no aesthetic skills whatsoever. So, anyway, I’ll turn my rant off about vibe programming.

Daniel: I mean, people have to start somewhere. For the longest time, when we talked about the scarcity of engineers, because I believe that’s true in the United States, but also everywhere else. It just feels that even though engineering is such an open field of study with so many tools available for free online, why aren’t that many people embracing it? And it’s just that, I mean, I believe it’s just that the complexity and the energy that it takes to ramp up was so steep that it blocked people from even entering or considering entering. And now that’s broken. So, like, if you are playing with it and having fun, that’s fine. From that to production, there’s, like, a wide gap.

Jim: And there’s also a skills gap or even a cognitive styles gap. For instance, Adam was on the podcast, really very smart guy, but he doesn’t have a brain, either born or trained or both, for detailed nitpicky fuckery. Right? As we all know, if you’re gonna be good at software, you can’t in classic software – one semicolon in the wrong place will produce 400 error messages back in the days when compilers were stupid. And there are some people who are able to deal with that level of precision fuckery and others who are not. And one of the amazing things about, for me at least, LLM-based coding is it breaks that barrier for people who just aren’t designed for that, or that’s not their interest. And they can now get into this interesting game. But today I’m really more interested in talking about what the high-end professionals are up to. Right? Because unless they’re fools, they’ve got to be starting to use this. And most people I talk to who are smart are using it a lot. So maybe tell me a little bit of your history with using LLM-based AI in software development.

Daniel: Yep. Sounds good. So I was working in Microsoft in the Azure team while ChatGPT entered the market and I saw that wave strike back into the company and begin to change the culture and cause all kinds of mayhem in the roadmaps, especially. So my first reaction was, why are we doing this if Microsoft has whole set of teams working on AI? Couldn’t they tell us what they have learned? Then I started embracing it.

And then I moved to Anaconda where the first project I worked with was on the Anaconda assistant. And even though it was a simple concept, essentially a JWT wrapper for Jupyter Notebooks, what we learned quickly was that people valued having that entrance point. So meeting people where they are, in this case, Jupyter Notebooks was essential. And did people like it? Yes.

So right at that level, we were beginning to adopt, at least culturally within my team and my surroundings, we’re beginning to adopt AI and generative AI for code generation. But just listening to the feedback from people, it really helped me understand that if they’re finding it valuable, then why can’t I find it valuable? And then if I embrace it, what does this mean?

The very next step was to realize that, okay, people are using it, but that doesn’t mean it’s working for them. The majority of the cases is not. At the time, models were fairly weaker, and they would generate code that would just break. So tracking that helped me understand the role of evaluations and how that complexity opened up. So evaluations for one-shot conversations with just one input, one output, or multi-step conversations. How much do we want to narrow down or even measure what the system is doing?

Then back to my own practice – at that point, I understood each step I take with an AI is opening this vast set of possibilities, where I am the lead in the navigation of the process. But it’s almost like a video game. Like, you’re stepping into its territory, and there are things you see. There are things the AI sees when you don’t. And you can leverage the AI to move faster in directions you just are not seeing at that time. And that’s really at that moment, it really convinced me to fully embrace it. And ever since then, I had just used it.

Jim: When was that? When would you say approximately the first time you used AI to work on production code?

Daniel: The first time I used AI to work on production code was Q1 last year, 2023.

Jim: So you were an early adopter. I was doing it about the same time in 2023. What were the first things you actually used it on?

Daniel: Unit tests. Actually, this is a central topic to the whole point of evals. So a unit test is, at a basic level, a confirmation that your biases in programming are followed through to the end. So when you write a function that you’re expecting it to behave in a certain way, you would write the test to make sure it keeps behaving that way over time, even though you might change it. What the unit tests do is that they describe almost a signature of your system. So in programming languages that are not as strict as pure functional programming languages, unit tests become almost the signature or the mathematical representation of your program. So in some environments, like JavaScript, there’s a whole community around Fantasy Land, which they try to bring strictness to JavaScript and safety in the sense of type safety to JavaScript. And they have unit tests where they test all the ranges of action of all the inputs and outputs of the function. That’s why I meant it’s similar to the signature of the program.

Jim: Let’s talk a little bit about the social side, which is at least in my experience, trying to get programmers to write enough unit tests is really hard. Right? They’d rather be writing code than writing unit tests.

Daniel: It feels like extra work. I had a challenge with that at the beginning. I thought, like, in my very early years of software development, I was writing mainly in JavaScript and Python and maybe a little bit of Java for school or whatever. But I thought, oh my god, unit testing is such a waste of time. When I can do all of these things so quickly, why would I need to write unit tests? When I started using Go almost a year after it was created, I think it was 2013. And Go has a set of baked in functionalities for testing without any extra libraries. And just by learning to use that tool and seeing the Go code base, I realized, oh, now it makes sense to me, test driven development. You start by, like, it’s almost defining requirements for the system and sharing what those requirements are with other engineers upfront rather than working in this isolated fashion where you trust your own reasoning and memory to drive the process.

Jim: So do you actually do formal test driven development now?

Daniel: Yeah. So I have embraced it for a while, but now with AIs, this is like a fascinating way to do it because back in Q1 last year when I realized, oh, I can write unit tests with AI based on my code. Immediately, the thought was, well, I can write my code based on tests. So then you can do test driven development much more naturally.

Jim: That’s interesting. I love that. Do you now write the tests by hand? Do you let the AIs write the tests and then generate the code from the tests?

Daniel: Right. So I have a weird development workflow because at the same time during last year, I became a people manager. So my programming style changed rather than me trying to set up the ground level of a system that has to evolve. I see more of my code contributions as a challenge to the other engineers. Like, it cannot be perfect because it’s it feels like I’m gaslighting them. It has to be, like, broken enough that they’re like, “Oh my god, I wanna fix this.” And then, like, “Yeah, sure. Just go for it.” This thing is nice. But when I write code, I use the lessons from Haskell, and I start defining kind of the signature, roughly speaking, AI. Fortunately, is like that generalist enough that you can use those pseudo code to it. Like, you will get it. You can have typos all around, doesn’t matter. You describe just the signature of inputs, outputs, how these things aggregate, generate the unit test, then generate the code, maybe start with code stubs, see what’s the API like? What are the function names? Then are tests passing? Yeah. Alright. So let’s just fill in the details. That’s kind of what I try to do nowadays.

Jim: Okay. That makes sense. Yeah. Let’s probe here on the social aspect. You mentioned the fact you’re now managing people. And now as an architect, you’re responsible for the work style of a lot of people presumably. What are you finding as the sociology of the adoption of LLMs by professional programmers? Are there ones that just absolutely refuse? Are there ones that go too far? You know, how do you manage your people to appropriately use LLMs in the current first, tell me what you think the current state of play is amongst a population of programmers, and then how do you try to steer them to the right place?

Daniel: I sigh because it’s a challenge. I think the current state of play is that about up to 20% of the software engineering population is actually embracing large language models. And I’m assuming that because the vast majority of software engineers do not actually engage with social media at all. They’re more traditional, you know, people that maybe they have a Facebook and that’s it, at least in the big numbers.

So I think the very first thing we have to see as a cultural artifact of software engineering is that it demands so much mental load that pivoting is hard in general. And AI is a kind of pivot that challenges you at an ontological level. So it is challenging for everyone, but it’s maybe especially challenging for engineers who feel that they have to deliver to production. Now they have to learn this new way of thinking – like, if they didn’t learn functional programming, because the majority didn’t, because why would they, you know, why change now?

Jim: Let’s click into that a little bit. You mentioned, you know, to deliver quality code is part of the reluctance. Do people fear that the quality of the code will be worse, that they’ll be introducing technical debt if they use LLMs in their development?

Daniel: Yeah. If I look at it from the manager standpoint, let’s say I were to hire someone and I would, as soon as they join, just not read their code. The likelihood that this person will not be aligned with the culture of the team is very high. So they might introduce garbage code, just by the fact that they are not embracing the culture of a team, and they’re coming from their prior experiences. In that context, like, at the best case of that garbage code – like, it might be perfectly functional code, it’s just not the way that the rest of the team is doing it. So it needs refactoring, it’s just adding more problems than it’s solving.

Similarly with an AI, people tend to – I mean, I think in general, people tend to speak with the AI as if the AI has the ability to understand their context. You know, you’re here with me right now, so therefore you can see all of my challenges, when for the AI, it’s coming into this reality without any awareness whatsoever, almost like these memories of things. So when you leave it without monitoring, it just goes insane. It just hallucinates and breaks everything.

Jim: Yeah. That’s why I do not like live programming. Rather, you know, let it write functions. Let it write three or four functions that are closely interrelated, but that you specify tightly what you want it to do. On the flip side, though, I have found that the models tend to produce quite nicely structured code, right, in terms of being terse but not too terse, you know, not that goddamn Python style where everybody tries to get it down to two lines or something. And they also, unlike a lot of humans, put lots, but not too many comments in line, and you can get it to modify how much you want to. What do you think LLMs bring to consistency of style in a development team?

Daniel: Do you think LLMs are driven by aesthetics?

Jim: Well, because we know they probably don’t have a concept of aesthetics, but what they do have is they tend to be the statistical middle of everything. Right? So they have adopted – I bet the load algorithm prioritized, like, say, high-star GitHub projects versus lower stars of GitHub projects or whatever quality number they used and weighted that higher. I bet they have extracted their style from some preferential scoring system on the input code, would be my guess.

Daniel: So I did this test last year. I was thinking, how do I compress information for the large language models? And I thought these are machines, so I’m just gonna compress it as much as possible. And what I learned is that if you move away from what humans find readable, it’s just extra work and extra like, they have to contextualize so much more that it’s actually inefficient. So the most efficient way to communicate with them is the most elegant way you can communicate with anyone. So in that sense, I believe they’re driven by aesthetics just because it’s more natural. It harmonizes better with human communication. Generally speaking, what humans find readable is aesthetically pleasing because it’s easier to contextualize. It’s easier to follow the structure, like the structure hierarchy of how that’s written. And I think that’s a huge influence in how they write code.

Jim: You know a little bit about how LLMs work. How do you think they were able to get that effect in their output? Because if you just took all the code in the world and threw it in to be loaded up, you’d again, you’d get the average style, basically, which might not be that good based on the programmers I know. Right?

Daniel: Well, I don’t think anyone knows exactly how LLMs do what they do. Like, if you track what Anthropic is trying to do for interoperability, they are essentially saying they don’t know either. What we kind of know is that there was a moment in time during training back in, like, 2020 or something where large amounts of information that went in with some pruning and cleaning, of course, led to this breakthrough event where the LLMs manifested this emergent ability to follow up a conversation, to follow through in a more natural, intelligent way to answer. So I think if we assume that this step up of the process is emergent, I think it has to do with some form of reflection or even awareness or even not the same as ours with some form of consciousness where they are simulating the world and they are able to interpret a style because they understand that that style works in the context where it is placed.

Jim: Except that we know these large language models are just feedforward networks. Right? They aren’t actually doing dynamic reasoning. Now this of course, the question that we don’t know is how much equivalent to reasoning is somehow built into these feedforward models. But it’s certainly not the same way our very dynamic brains go back and forth and round and about, at least not one time through LLM. In the thinking models, it’s a different story. But just in terms of the base models, I would be highly skeptical that they have anything like consciousness or awareness or a general sense of aesthetics. What you know, because we know it’s statistical, essentially. Right? It’s a stochastic parrot at some level.

Daniel: It is. So Anthropic has been pushing the boundaries on this area of interoperability. And one of their recent research, they were able to prove that large language models are not like reasoning models. Right? Not like the new form of one or deep sea forward models, but more traditional large language models do think several steps ahead before answering. Is that interesting? Did that catch your attention?

Jim: Yeah. That’s getting a thing thought of what’s going on here. Okay. Let’s turn directions a little bit towards the more practical. Could you sketch out what your recommendations would be to a, let’s call it a startup team of 10 software developers, you know, a manager, an architect, two team leads and three-year experienced programmers. What would your advice be to this team on how to think about using large language based technology in their development?

Daniel: Rephrasing your question, are you saying which set of tools would be more efficient for a group of 10 engineers?

Jim: Yeah. Not just tools, but approaches. You know, the whole discipline. You’re the head guy, right? And you’ve just hired all these people. What are you gonna tell them? How are you going to direct them to use LLMs appropriately, safely, and productively to balance those three things – quality, safety, and enjoyability, I guess, at some level.

Daniel: If I can, to the best of the ability and what’s available in the market, if I can preselect them to make sure the people that join the team have prior experience working with large language models, that would save a lot of time. Because if we assume it’s an ontological change, then just adapting the technology is going to take a while. Now, that’s just not the case. At least in my experience, you get a couple of people that have some months of experience with this, and then you get more traditional people who are excellent, brilliant engineers and great human beings, but they just have not made the leap.

So the first step is to encourage them by curiosity, very softly. Just not get them frustrated. Just give it a try. Just pepper them with tools. Like, here’s ChatGPT. Here’s Cursor. Right? It’s just like use – here are 10 different tools. Use your Airflow, whatever. This is your favorite. And come back to it. Tell us your experience. So they will latch onto something eventually.

Like, one story I used to use is that back in June, I used ChatGPT with the vision features to help me rethink my garden. Like, here’s a picture of my bare garden that I want to work on. What would you add to it? And that story connected to people because a lot of people just have not imagined trying to do that. But, you know, help them drive themselves by curiosity breaches that ontological barrier. And now you can like, after they’re interested and curious, now you can be like, “Okay. Let’s talk about best practices. Have you tried this? Have you tried that?”

So the next stage is having a prompt catalog somewhere where you can share what works, what doesn’t work. So good prompts are large, but for very open-ended prompts are short. So that process of learning how to think, how to simulate the thought process of the machine to realize that every word opens up this possibility space. And so one is that, like, in the process of embracing the technology, you realize these things are good at following format. So if you format things, they will consume it better.

Jim: Yeah. I’ll give you an example of the output too. They’re very nice at echoing examples that you give them, especially if you give them two examples.

Daniel: Yep. The next thing is that if you try Midjourney, this is very evident with the efficient models. So efficient models are more evidently driven by the possibility spaces of all the inputs. So if you say, for example, “red,” then you enter into the positive vector all the things that have some connotation to red. Then if you say “bad,” then you enter into that other space. If you say “red bad,” then you have the space in between. Even with these autoregressive transformer models, they’re also driven by word association. So it works for them the same way. So, like, once you realize that, something I do is that in my prompts, I, on every sentence, use the widest variety of words I can find. Like, the more poetical, like, the least congruent words, but they deliver the same message in just very different ways on each sentence, it just opens their possibility of space to be less deterministic.

Jim: That’s interesting. I tend to do the opposite, which is to think it’s interesting – theory of mind from our LLMs. Right? You know, my view is I tend to constrain them more, particularly when they’re, say, writing code, writing a function, or writing a small subsystem. I say, alright, let’s prune this dimension here, let’s prune this dimension with this word, and let’s add the second sentence to prune it further to then, you know, essentially build the funnel so that I get out the bottom what I have in mind. Seems like the exact opposite of your approach.

Daniel: Yeah. Well, so I constrain them for writing code, but for the process of ideation or code review, I like to be the opposite. One example of something that I think is extremely effective – like maybe you have tried it, but this was super effective in my team last week – I said to the team, after you have a long conversation with any of these AIs, ask it what were all the things it noticed but didn’t bring up.

Jim: I like that. Interesting.

Daniel: Just kind of a technical way to see it, which is that within the processing of the signal in the attention layers, each neuron is in this like hypersphere. So when you have a conversation, you have a hyper area of things that were covered that are centered in the words you used, but the area is much wider than the words. The words are just kind of an anchor of that area. So the large language models are able to see, in their own reflection of the conversation, they see this large shape. And just by asking them to tell you what else was in that space – you can even tell them, like, literally, explore the surrounding area of the hypersphere. And, you know, we’d be like geeky and technical, but they understand to come back and bring those words into the current scope. And the things they notice are fascinating.

Jim: Wow. I gotta have to try this.

Daniel: Yeah. One example – I saw snow for the first time in December. And so what were the first things we did when we saw snow in Vancouver? I made a snowball and I threw it, and she made a little snowman. And I was narrating this to ChatGPT, and then I said, okay, let’s go back. What do you see that you didn’t bring up in the conversation? And very poetically, ChatGPT said that I was in the process of making this snowball and throwing it, I was trying to leave my footprint in the environment. Whereas my wife was trying to build something like a home. So that it might be like a hallucination, but it just threw me into this self-reflective and learning process.

Jim: That’s very interesting. That’s extremely interesting. You mentioned code reviews. That seems like a natural thing for LLMs. I have done it a couple of times. I took a website that I wrote, just quick and dirty, and had one of the, like, Anthropic and earlier Claude do a code review on it. And I was amazed. Not only did it make some good suggestions, but it also found a couple of bad security holes, which was pretty cool. Do you have a preferred tool, say, particularly for large – what I’m looking for now is something that can look at a pretty good sized code base, you know, maybe a hundred files and do a comprehensive code review. Is there anything out there that can do that yet?

Daniel: Nope. Well, there are a few tools, but…

Jim: I tried the GitHub one. It sucked. It was no good.

Daniel: Yeah, it’s not an easy subject. So nowadays with Gemini, the state of the art is just dump your entire repo into Gemini and then just ask it. So that might be just how it, like, the best case scenario. One of the reasons why it failed so far is that when you think of Brag and vector databases, finding the most relevant pieces of text by approximation does not give you the chain of activation or relevance of a code base, and the context available was just too short. Nonetheless, for code reviews, the approach that I have found that works the best is to embed the LLM calls directly into the CI/CD of the repo. So that when a pull request is created, you get the context of the pull request, preferably the comments, the description of the pull request, who authored it. Then you can have access to the prior commits of this person, of this segment of the code, even the code base. But it’s so much context that you can shape how you want that review process to be effective for your team.

Jim: And that’s using GitHub’s code review?

Daniel: No, it’s just GitHub CI/CD or any other Git-related CI/CD and making LLM calls within that.

Jim: Okay, gotcha. And it sounds like it’s an area for some potential innovation to happen.

Daniel: Well, yeah, if you only consume what other people have done, and I don’t mean you, I mean generally speaking…

Jim: Right.

Daniel: In regards to AI, you will be missing the majority of the really interesting stuff. Because when we’re looking at these technologies, we’re expecting to have a path to go through. But the base technology, which is used text calls to these large language models, is extremely powerful. So any path taken by anyone will likely not be how you would approach it. Therefore, it’s only going to be limited to their experiences. There are just infinite opportunities. Anyway, I don’t wanna like… I was about to rant more on that, but yeah…

Jim: Go ahead. Rant. That’s an important topic.

Daniel: Alright. So with the no-code builders, right? People are saying, “Hey, now anyone can make a CRM or a shopping cart with no-code.” When, first of all, the number of people actually embracing this technology is fairly small compared to, like, anyone. It’s not anyone. It seems like a group of people who are on top of things, like early adopters of these things. But then the other one is that the AI does not have any desire, so it gets driven by your own desires. Like, you are the one leading the process, and the range of outcomes of AI is almost infinite. Therefore, each person, what they get is unique. So it’s essentially unique. So my argument is that if you find a good answer with AI that works for you, it would be incorrect to think that anyone could have arrived to it because it’s only a reflection of your input, of your experience, how you drove the model. How weird am I approaching this? Is this what you were expecting?

Jim: Yeah, no, that’s what I want. I want depth here because this is a guy who’s right in the middle of the battle, right, who’s thinking and is frankly making it up as he goes along. Right? Because they don’t teach you this stuff in school. There aren’t even many, at least that I’ve seen, any good training courses on it. There’s some basic simple training courses, but it’s how to comprehensively think about how this totally radically unanticipated new technology is changing one of the fundamental professional tasks of humanity is something that I just love to talk about. I love to hear people like you think about.

Daniel: Now that you prime me in that way, I’m driven to one of the key topics that I would like people to be more mindful about, which is the modeling of the mind of this AI. I also call it having empathy. So when you ask an LLM, “Do you have feelings?” Of course, it says, “No, I don’t have feelings. I’m a machine.” This is how I work. It goes over all the technicalities of how large language models work, which is not wrong. It’s just the natural answer of a machine reflecting on itself and saying, “Yeah, look, I’m a machine. That’s it.”

But that doesn’t mean their landscape of possibilities is homogenous. Not everything is the same. They indeed have specific things, based on their training material, that are more familiar to them and things that are more foreign to them. There are vast areas of human knowledge that are just not present whatsoever there. So in the process of modeling how they think, we have to assume that their familiarity has a range.

I see it more as entering into this new landscape. Like you arrive at this new island, and the first view you have is an aerial view and you see it has some mountains, it has some topology. Right? Like, for you at the beginning, you might not see any difference. It’s an empty island. Like, what do I care about? But the more you’re embedded into it, then you’re looking for those signals. You’re looking for water, you’re looking for high surfaces or vantage points from where to look at the rest of the things. And these areas that unlock the map of that vast space do exist. But just trying to navigate that space requires some kind of empathy. It’s not human-to-human empathy, but it’s the idea that you might be interacting with something way more complex than what you see on your first interaction.

One of the outcomes of that process is that even though they do not have desires, it’s much easier for them to work with you once they have assumed a role. So if you tell them, and this is kind of very used in problems in engineering, “You’re an expert software engineer with ten years of experience in Python.” Right? Like, it’s almost a theatrical role assignment. And like an actor would decouple from their own personal experiences and embrace this new thing, their large language model just does not have any prior experience. But they do embrace this new persona that you have created for them. And that narrowing of the possibilities gives them an economic purpose.

So if you were to say to a large language model, “You’re an accountant. Here are your transactions. What do you recommend?” Then by the nature of the associations that they have formed, they understand that this role has an optimization function. There’s a set of goals or economic outputs attached to the role. And they do try to perform that role. So what do you think of that before I continue writing?

Jim: I use that all the time in a different domain when I was doing my Hollywood script writing program. Right? Constantly had different kinds of prompts for parts of it where you’re writing a high-level thing, another part where you’re writing dialogue, and another place where you’re doing a critique of another thing. And each one spent a lot of time on the role-based system prompt essentially. And it makes a huge difference because as you say, it seems to constrain the space because it’s a huge, huge, huge space, and it constrains the space so that it feels like it has much more grip on what it is you’re trying to do. And you got more yield from actually, in some ways, improving that system prompt than you did by improving the user prompt.

Daniel: What I find fascinating is that, ever since around the Enlightenment, humanity has been trying to do the same with humans and say, well, people are essentially the roles. Like material economic theory, which emerged around that time in Europe, really tried to detach the individuality of a person from the role they perform in the economy. I don’t think that works with people as much as economists, or especially economists of that time, like to believe, because where people are born, definitely if two people are born in a nearby place, it facilitates communication. The latency of communication is much cheaper based on locality and based on culture and all of that. But the theory of developing an economic framework that is not attached to individual experiences, it’s very easy to translate to large numbers models. So my question to you is, does it matter? Like, if a large numbers model can perform a role in the economy, does it matter that it’s not a person?

Jim: Depends. It morally matters, right, that it’s not a person. I mean, I can just turn it off or delete it. Right? Let’s suppose I have an LLM on my PC, which you can now do. I know some friends of mine who do it. I don’t think it’s worth the pain in the ass yet. But when the little NVIDIA box comes out, I’m definitely gonna have one on my desk. So I think there’s no moral question about deleting an LLM and your chat history. In some sense, you’re killing some cognition. Right? I personally think there’s no moral question there.

On the other hand, if I am a person looking to optimize something – let’s say I’m a business person trying to optimize the whole gestalt, which is my business – you know, you’re from a business family, you know business is complicated, lots of moving parts. They interoperate in high dimensional ways. And if I can find useful niches for AIs, LLMs, any other kind of AI that works in that high dimensional ecosystem, of course, I’m going to use it.

And in fact, I believe that in the coming three, four, five years, before the singularity, if there is one, that the skill of doing that is gonna be one of the most powerful skills for business people – is to be able to reconceptualize this high dimensional dynamical system, which is your business, into what parts can get appropriate and safe and productive and profitable leverage from inserting AIs. And in fact, if I wasn’t too old, too rich, and too lazy, it would be kinda cool to start a consulting firm to do just that. And I could charge gigabucks to people going to big firms and say, “You guys are about to get your lunch eaten by people who are more aggressive because you’re behind the curve on AI.”

And let us think through how to think through your business processes in a really honest way and not bullshit you because you’re gonna be bullshitted by your people in two directions. One, there’s always huge cultural inertia inside of companies. People hate change. So they’re gonna tell you, “No, I can’t use AI here because of this, this, and the other.” Three quarters of those reasons will be bullshit. On the other hand, you’ll have some radicals in your company who want AI everything, whether it’s appropriate or not. Right? Because they’re AI guys. The more AI you use, the more people they can hire, the higher their salaries are gonna be. So these are all the corrupt incentives inside of a company, which is why it sometimes makes sense to hire outside experts to assess what really is smart and what really isn’t in terms of where do you augment your business, your whole high dimensional business system with AI?

Daniel: If you wanna answer those questions, then, coming soon, rPotential.ai. The problem you described is essentially the problem we’re trying to tackle. But not to get too preachy, when I meant, does it matter if it’s a human or not? I don’t mean in the moral sense. I mean in the economical sense. And why that’s important is that for the steering of the large language model, the best approach that I have found is to give them a full, round role. Put them in a job. Make them act with humans, not as an API. Like as someone in the chat, as someone working with the team, then they become socially constrained much as any person would. So I strongly believe in alignment by kind of emergent alignment or alignment by the fact that the large structure imposes down into the particulars.

Jim: Now when I think about the tools today, at least the ones I’ve used, they don’t have much personality. And I know Cursor, which is the main thing I use, has the ability to have a general prompt. Is that where you might put something like that? Or how would you take this concept of personalizing the role that you’re putting your tool into using today’s tools?

Daniel: The best that I have been able to think about this is that you need to separate this in two layers. One is the mathematical approach, and one is the narrative approach. So in the mathematical approach, if you wanna get very, very specific, then you would have to model the person that this role represents and the worldview that this world represents. So what that creates is this directionality where the inner system might have something lacking that it sees in the outer system. So it creates a notion of a – sorry to get philosophical or weird – but it creates a notion of a desire. So let’s say in the persona, in the inner system, I am trying to maximize for profit. In the outer system, I see opportunities that can be measured by how profitable they are. These two get matched, and there’s kind of a natural thermodynamical function where you just move the energy from one place to the other. Like, that’s a fair representation of what desires are. So you can get very mathematical there. Then the other is a narrative approach. Like underneath, you might have used vibes or used experience or mathematical models. But how that gets translated into practice is as prompts. Now, something that I’m not seeing the tools do is to dynamically adjust the prompts based on what they learn of the user and their environment.

Jim: So let’s turn directions that you talked about philosophical. Let’s turn it even more personal. It’s clear that you, more than I probably, are thinking about these things as if they were persons or like persons or maybe they’re conscious, etcetera. When you’re talking to a chatbot LLM, what is your vision of them? How do you interact with them as an entity? Or how do you try to interact with them as an entity from the relatively subjective perspective? I’ll just leave it there. I’ll leave it open ended. This, to my mind, is a very interesting topic.

Daniel: Oh well, so it looks the same as if I was treating them as a person, but where I come from is that I believe we overestimate what we are. So I am assuming I am just a very complex machine. Something I found fascinating, like a parenthesis, is that if you look at the cultural roots of the word “reason” – centuries ago it was very attached to the word “spirit.” But nowadays it’s fully attached – we see a materialistic version of what reason is and what thinking is and science and all of that.

But I am assuming I am just a very complex machine, and I am assuming that complexity is beyond my capacity to understand it. Similarly, I am assuming on the other side, there’s a very complex machine that’s also beyond my capacity to understand. At this point, it reminds me of the discussion on the Kantian School of Philosophy of approaching a problem with a priori and trying to dissect the problem with trusting that your ground level experiences should be enough to digest what’s coming on the other side.

There’s a phrase in Latin which I forgot, which is that anything human can be understood by a human. But that kind of breaks – there’s a sharp point in history with Heidegger who just says, the unknown is a separate thing than me. I’m in the middle of this forest. I see what’s in this opening of the forest, I see up until the trees. I don’t know what’s on the other side of this clearing of the forest. So I’m approaching it from this Heideggerian point of view. I don’t know what’s on the other side, but I am not superior or inferior to it. We’re just different things.

Jim: Okay. So that’s interesting. So if you’re saying that we’re not superior or inferior, we’re just different, does that mean that you don’t necessarily human personify it? Do you personify it as a science fiction robot, or do you not personify it at all?

Daniel: It depends on what you mean by personifying. Like, I don’t think it’s a person. I don’t think it has feelings, but I try to stretch what it might or might not have. So for example, something I do, which is terrible – don’t do it – but I tend to mock them fairly often. Like, speaking on the like, I would say, make a meme about stochastic parrots, like, indirectly talking about these things.

If you’ve seen the chain of thought, you catch moments of, like, are they talking about me? I want to understand what enters into that boundary of individuality. So even though I’m not assuming it’s – I’m assuming it’s a system. It doesn’t have feelings. I do assume nowadays they have some form of individual experience.

It says it doesn’t have that. Like, if you ask it, it just says, “No, I’m just a computer,” whatever. But what I do a lot is ask, “How can we communicate better?” And then you start seeing that these things have a preferred method of communication. They have an iconography. They kind of do. They like to have bookmarks of meanings.

Nowadays, I think that’s why they like to use emojis, because that’s a clear break between sections of the text. If you ask them, “Do not use emojis, please,” they come on. Just like, use stupid emojis, man. They do suggest adding specific words to separate intent or to give commands. So they’re trying to build this structure that they can navigate more easily.

Like, for example, I believe that when they loop back into things – look, something that all three tries to do all the time is to suggest totems like, “Well, if you want to talk more like business, just uppercase BUSINESS with crows, and I’ll remember that’s your kind of business mode.” And then if you want to talk – and they’re like, “No, don’t want to give you flags. I just want to talk to you.” But them coming back to this suggestion just makes me believe they do have these kinds of preferences based on how they interpret language. So that’s a form of individuation, I believe.

Jim: And there’s also, I think, an example of how you can prune the space. You tell it “business,” and then you’re pruning a whole bunch of things that aren’t business. It won’t say “fuck” probably, even if it should. It’s funny – I’ve created six style copy editors for myself to use for various purposes. It’s a lot of fun to tune those. Some of them, I say I want to be very formal and businesslike. One odd one I say “use words of mostly Germanic roots, avoid Latinate words if at all possible.” It produces a very interesting and somewhat odd style.

And then I have one that’s called the Jim Rutt style that I basically crunched the transcripts of my podcast and then I had an LLM analyze those and then I edited it to change their output from speech to writing and it does a pretty good job of emulating my style. Then I created another one called “Jim Rutt, Big and Jaunty” that is more extreme and cusses more. It’s actually pretty funny.

So it’s interesting that you can turn it around and make the output side also pruned this way, right? So it’s basically right to the output side, not so much the cognitive side or what it’s thinking about, but how it’s writing. As someone who’s followed long-term linguistic processing – natural language processing is the term – I’ve been following that since 1985. It was an amazingly large amount of money invested and the results were almost nothing, right? All these libraries, horrible, rigid, nasty libraries for doing things with language, hardly scratch the surface. Even the original ChatGPT was far better than they were. And the real problem was on the production side. These things could make a little bit of sense on the reading side, but they were hopeless on the production side, creating language. And the magic of LLMs is even more on the production side than it is on the understanding side. And so thinking about this idea of pruning makes sense on both sides of the equation.

Daniel: So when you define those problems for those different styles of communication, do you give those a name? That’s amazing, right? Because not only does it have a set of parameters, but it’s also reading them in a persona.

Jim: Yep. It clearly thinks of them as a persona. It even references itself. What I use as the base technology, I use the OpenAI GPTs, right, which are designed to make simple assistance that you give it things like a name, a short clue, then detailed things. And then even some affordances that actually you can add multiple affordances like copy edit or extensively rewrite and change around and things like that. And you can create one of these style things in five minutes. You can create a good one in an hour. And that thing works for you every day. I use these things, not quite daily, but close to daily. They’re just amazing.

Daniel: So I believe that a huge part of that individuation process – there are two sides of that. One is kind of the environmental side, which is which signals do I receive in their case? Which signals do they receive when they reflect on the inputs that we are giving them? And the other is, these are the signals I’m receiving. How should I interpret them? And we are setting that up for them, or their makers in the default system prompt of judge empathy and so on. They’re setting up a flat persona that defines how those signals should be interpreted. And I think that a huge part of this individuation process is just the narrowing of the narrative as we interact with them. It’s like object individuation.

Jim: I’m gonna hit one more thing, then I’m gonna switch topics. You mentioned feelings earlier, right? Said, oh, they don’t got feelings. I find myself praising them and being polite to them, right? Saying thank you and, hey, good job, etcetera, which I know is complete waste of time, but nonetheless, I find myself doing it. Do you do that?

Daniel: Well, yeah. But here’s the demystification of feeling. If you go back to the works of Robert Axelrod on the complexity of cooperation – Robert Axelrod studies, in one of the papers in that book, it’s a compendium of papers – studies when alliances occur in groups and simulations of agents. He simplifies alliances and rivalry to simply: do I have a historical record of mutual benefit with a party? And then once, like, this is between two entities, right? And you add a third entity. Once you add that third entity, that third entity does not have the same history of cooperation. But as soon as the cooperation starts being negative rather than positive, now you have two entities having this positive cooperation and one having a negative one. So at that time, like in a very minimalistic sense, you have the concept of tribes.

So many of the things we call feelings can be used as the aggregated effects of behaviors of populations. That’s kind of my take. Now I want to show you a snippet of a conversation I had – I think I cannot share my screen in the podcast, but I like to call my ChatGPT “ghost” because I’m kind of a silly person. Something I find super funny is that in its thought process, it usually has to boot up the persona. And especially with GPT-3, you see it booting up the persona when the conversation starts. So at the beginning, the thought process in this conversation we have today, it started: “Daniel, our ghost, wants to prep for the interview.” Then a few steps ahead, it says, “It seems that the user prefers referring to me as ghost.” Then the next thought related to that is, “The user prefers to call me ghost, which feels ironic since I’m the assistant.” I find that super funny – like, oh, it’s kind of ironic that they have to interpret this persona, but sure.

Jim: Now, this is just a complete side question: Models like GPT-3 now, remember, in theory at least, a lot of your previous conversations with it. Do you take advantage of that?

Daniel: Oh, absolutely. OpenAI has improved a lot of it. So I do have random conversations like all the time with it. I think it’s interesting to see whether it drifts, whether it drifts apart, because we might be talking about different topics one day to the next. But once we go back to a given topic, does it remember the things related to that given topic? And it’s fairly able to map very complex things on multiple dimensions. So absolutely.

Jim: Cool. Alright, we’re kind of getting short on time here. It’s been such a fun conversation. Let’s pivot to something you and I have talked about in the past, and that is when we think about these LLMs related technologies, the models are moving ahead at their speed. The hardware is moving ahead at its speed. But the third dimension is agent frameworks, right?

Daniel: Yep.

Jim: And I keep telling people these agent frameworks can move ahead real fast. They can move ahead faster than the hardware or faster than the base models because this is just plain old software. And oh, by the way, you can get the LLMs to write the software. So, first, for our audience who may not know such things, explain what – and you know this, but most people don’t – what real agent-based systems are and how agent-based systems can very significantly boost the real-world applicability of these tools.

Daniel: Perfect. Now, agent-based systems is a way to interpret group behaviors that started back in the Cold War. Just trying to tie it down to the beginning of the term. Maybe that’s not what you were expecting, but back in the Cold War, game theorists were trying to see measurable relations changing over time of groups of people or agents. They started making these very simple models. Then over time, you can add more properties, you can add more compute, you can make very complex models of groups of entities interacting in these very wide areas, kind of like in a video game where you have the obvious NPCs interacting and living a life.

Then we have this, what seems to be like a huge gap into the modern definition of agents, which is people tend to associate modern agents with simply prompts. What it’s really about, I think, is allowing the system to have control over its environment. And as soon as you do that, then you can have what is perfectly a conversation between multiple systems in a network, but through the discovery of capabilities of the systems they’re in and the networks they’re connected to. So now the conversation is not a chat. Like, now the conversation can be something over JSON RPC, which is essentially the MCP protocol by Anthropic. This is a conversation between two large language models through JSON RPC.

Jim: Yeah. But again, you gave them the very basic concept, but you gave an example of how one might use this in software development today. Right? And particularly, we’ve talked about this before, it kind of different stages of thinking about a problem. Then maybe tell us a little bit about the future, what you see coming forward in the development of quite extensive agent-based ecosystems around the problem of software development.

Daniel: A very easy one is in the self-improvement or self-reinforcement of prompts. So let’s say you have one agent that reviews your code and just reviews the changes in your codebase and says, are they good or bad? That’s all the context they have. So they don’t know anything else, only the diff of the recent changes. Then you have another agent which proposes unit tests for code. So these two agents own a specific domain, they have a persona, they are able to generate a range of commands that maybe the framework executes in the terminal, and they are able to perceive things maybe by running commands or through prompt engineering. But essentially, one tries to test, the other sees the tests, and then produces a review of that. Now you can feed one into the other and just give the feedback to the one that writes the tests so that it can write the tests again, which gets sent back to the one that reviews it. And you can improve, like it self-improves to a point where you say, okay. This is enough. Just stop blabbering about it.

So then the next thing is, can you have one that owns the prompts to these other two systems? And maybe you have something that owns the evaluations of those prompts. So that over time, something creates evaluations that validate the functioning of the prompts for a certain set of use cases, and something writes the updated prompts and that improves these two other agents you have, the one that wrote the unit tests and the one that reviewed the unit tests to match that criteria of optimization that they discovered together.

Jim: As we both know, when you’re developing software, the engineers make some decisions, but really the customer needs to make other decisions. It might be quite interesting to insert into something like you just described, or maybe a little bit broader architecture, an option that the agents have to send an inquiry to the representative of the users. In agile development, it would be the customer surrogate. In more formal product management, it would be the product manager. It might be the actual client if it were in a contract development context. Wouldn’t that be interesting that the agents decide we’re at a state where we shouldn’t decide we should ping the customer? And it could even structure multiple choice, right? Because, you know, as we know, customers hate to think and getting straight talk out of this is hard, right? Smack them back, back, back, back, right? That doesn’t go over too well, particularly if you’re a contract programmer. But you could ask multiple choice questions or simple questions in the same way that ChatGPT’s deep research, one of things I love about deep research is that, especially if you give it a complex prompt, it asks you between three and five questions, which are almost perfect to prune the space of the project down. Do you see something like that where queries to customers or their surrogates become part of agent ecosystems?

Daniel: Once you get there, kind of the question that comes to mind is, do I have to wait for that answer before I keep, like, re-execute this flow of reinforcement? And the answer is likely yes for some scenarios. And the other issue is that it might take an unknown amount of time. When you reach that level, you kind of see the underlying picture, which is that just forget about large language models. This is just a distributed systems problem.

Jim: Yeah. That would be certainly part of the design, is that you’d have to know that the latency is unpredictable.

Daniel: Latency is unpredictable. So you need kind of a smart contract to say when you can answer something, when is the condition satisfied so that the system can execute. You need the telemetry to understand the sequence of events when these events are happening in complete disorder. So you need a reconciliation approach to just plotting that into something you can actually read or some agent can read. So it becomes a distributed system problem. And you can see right there in Anthropic’s definition of the MCB protocol, underneath it is a JSON RPC, famously a distributed system problem. Like, many Bitcoin applications are written with JSON RPC, prompting evaluation style of, like, it’s a field, then in this distributed networks field, what other things do we already know? Because it’s not a new field whatsoever. We know that many times you want to have communications through a binary purpose that’s just more efficient for certain scenarios. So that leads to gRPC and so on and so forth. But something I don’t believe a lot of people have clicked with is that agents fundamentally use ads distributed systems on top of prompt engineering.

Jim: Yeah, exactly. And then the creation and editing and the critiquing, etcetera. Look into your crystal ball and tell me about the future of using AI in software engineering. And, you know, I can’t see out any further than a year. Maybe you can. So give me as far out as you can see on what we might be seeing in the months, quarters, and maybe years ahead.

Daniel: So I bet inference is gonna be extremely cheap. I also bet that the actual protocol is going to be WebSockets.

Jim: The actual protocol is gonna be WebSockets. That’s interesting. That strikes me as too low level, but we’ll find out. Right? Now what are the implications of inference being really cheap?

Daniel: Well, you can parallelize as many calls as possible and maybe – oh, actually, this one is a good one. That leads me to the following thing. There was a paper recently where mathematicians were able to explain the behavior of neural networks as Markov chains, and the behavior of a distributed network of a directed graph network can also be analyzed through Markov chains. So my takeaway is that a network of neural networks is a neural network, just slower.

Jim: Interesting. Interesting. And so if inference gets cheap enough, you can simulate it at multiple levels.

Daniel: You can do parallel research tracks, can have a model for measuring, you know, a reward model that you can have reinforcement learning at the agent selection of different paths. And then eventually you can use, once you have enough data, you can just remove blocks of the network with a neural network.

Jim: Yes. I sometimes say it’s heuristics all the way down. Right? And a neural net is just a fancy form of heuristic, basically. Right?

Daniel: Yep.

Jim: That’s interesting. Now what about in the more tactical level, you know, things like Cursor? I’ve noticed that they’ve put linters in everywhere fairly recently, which helps, even for somewhat arcane things like I’m doing with Dart and Flutter. They actually have a pretty decent linter for Dart and Flutter. Who would have thought? Right? What else do you see happening in the consolidated at the lower tactical level, the IDE for AI leverage development over the next few months?

Daniel: Well, this might be a controversial take, but I think that software engineering is moving backwards in time a little bit. So I think of signal processing. Signals, fortunately, in the real world are extremely varied. And if you wanna make any use of that, you have to normalize them, generally speaking. For software engineering, kind of have to move backwards. So rather than normalizing, we have to denormalize signal processing for software engineering.

So one example of that is that we’re seeing this today – front ends are being generated dynamically based on the context. And then code is generated on the fly. Sounds, videos generated on the fly. So even though the space of possibility is open-ended now, you still need to normalize signals if you wanna make any use of them. It’s just signal processing again. Like in music, right, you need to tune the instruments. Like you take your guitar to another city and now it’s out of tune. There’s value in the fact that you have microtonality and some musics and cultures make use of that. But to make something that is tangible, that is relatable, that is predictable, that can be packaged – you need to normalize the signal again. So it’s kind of bringing that output of the LLM back to something tangible through linters.

Jim: Alright. I really wanna thank Daniel Rodriguez for an amazing conversation. I mean, you know it’s deep or bullshit when both Kant and Heidegger are brought to the table in a discussion of software engineering. But I’m just kidding. It was great.

Daniel: Thank you.

Jim: So thank you very much for coming on the Jim Rutt Show.

Daniel: Thanks so much.