Transcript of EP 320 – David Shapiro on Mastering AI Tools for Research

The following is a rough transcript which has not been revised by The Jim Rutt Show or David Shapiro. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is David Shapiro. David is a longtime techie and AI guy. In fact, he’s been studying and fine-tuning LLMs since GPT-2. Damn. Yeah, he’s an old dude in this stuff. It’s hilarious how being an OG means like four years or something like that these days. He’s a thinker, a writer, and a video maker on topics around the impact of IT and AI to our future. Welcome back, David.

David: Thanks for having me back.

Jim: Yeah. He recently appeared on our show in episode 317, where we talked about post-labor economics, which is David’s term for what happens when machines replace most human labor—a very interesting and well-received episode. So check it out if you haven’t already. I came upon David in Twitter land. He kept posting these things. I go, “Goddamn, this is an interesting guy.” So I reached out to him. And one of the things that he wrote about, hinted about, was how he used LLMs in his work in developing his post-labor economics, presumably, which we’ll find out more here in a few minutes, in both researching and probably in the writing that he did. And I was interested in the topic and I assume a lot of other people are as well. So I invited David to come back quickly to not focus on the substance of his ideas, but focus on how he used LLMs to do the work basically. So David?

David: Yeah. Thanks for the introduction. Long story short, every time technology generates a new tool, academia has an existential crisis. When Wikipedia came out, word processors—and I’m old enough that I remember using a card catalog in the library. But fundamentally, these tools are just tools. I married a librarian and she did her master’s thesis on GPT-3. When you look at these tools and you study them and you say, okay, what is it actually good for? And you test it to failure and you watch the progression from GPT-2 to GPT-3 to ChatGPT, and now we’re on GPT-4. They just get better and better. And like any tool, it is a matter of skill and having the correct mental model.

We were talking just before we hit record that you can hurt yourself pretty bad with a drill or a chainsaw, or you can make really good output with those things. And one of the things that I want to emphasize is that when I do this work, and we can get into the details of my process, I’m not just asking ChatGPT to solve this problem for me and then copy-pasting the output. There are many more steps. I approach research the same way anyone else would approach research. It’s just with another layer of tooling on top of it. So my tool stack is a combination of ChatGPT, Deep Research, Google’s Notebook LM, and sometimes Grok and Perplexity. Although, I will say Grok is a little more of a loose cannon when it comes to epistemic rigor, despite Elon Musk’s claims about being maximally truth-seeking. But that’s the basic stack, and we can dive into whichever layer you want to go to from there.

Jim: I reinforce what you say because I’m also in a mode for the first time in a number of years doing truly primary intellectual work, research intensive. And I use a whole flight of tools. I use the high-end ChatGPT Pro, I use Anthropic, I use them to check each other. When I get a chunk of ideas or text from one, I’ll check it with the other. Once I have a working document, I’ll ask the other engine, “Would this change be an improvement in this document?” And so, using a number of tools is important, but also to reinforce your point, you don’t say, “Write me a paper about X” and then press go. Right? And oh, by the way, it will do it, but it’ll suck a big one. It’ll be terrible. I mean, useless, worse than useless. It’ll just be garbage.

And so what the tool is actually good for is hugely important to your point about chainsaws and drills. You’re not gonna use a chainsaw to install a hanger to put up a picture in your house. You might use a drill for that and probably won’t use a drill to try to take down a tree, but a chainsaw will be good for that. And so thinking clearly about your tools and their limits is critical. I live on a farm. I have to deal with all kinds of crazy stuff on the ground. And even if I have a chainsaw, if I have a 36-inch white oak down and I only have my 14-inch light chainsaw with me, guess what? Don’t do that. Don’t use Perplexity to write a paper on the science of emergence. Better go back and get your 24-inch chainsaw. And even then it’s pretty tricky to cut a 36-inch white oak log with a 24-inch chainsaw, though it can be done.

And then to make all this more difficult—tools, tasks, limits they’re good for—these tools are changing every day. And sometimes for the worst on Tuesday, and then it’s better again on Thursday. And then in major ways, they’re changing every few weeks. You can just feel them getting better and stronger and not just the official releases. They’re screwing with this stuff constantly, especially the wrappers around them. So very important to think about these, in fact, multiple of them as tools in a toolbox and think what they’re good for. So now let’s get back to your project. So first, maybe just very quickly reiterate what post-labor economics was, and—

Jim: Then start telling us how you used your tools.

David: Elevator pitch for post-labor economics is what we’re witnessing is the decoupling of economic growth and activity from human labor inputs. It’s been happening for decades, and it’s only accelerating with the help of AI. And that’s part of what we’re gonna be talking about today is how much labor did we save ourselves by using these tools for research.

Another dimension is that I’ve actually used these same research tools quite extensively for my health. I had some long-term complications from COVID, and I had not made any progress for a couple years until I got access to these tools.

In terms of thinking about them as tools, one of the places where I start whenever I’m teaching anyone or explaining it to someone is the first thing that they’re good for is just learning—just exploring a space where you say, “Hey, I don’t know about topic X, teach me” until you get to a point of saturation or you explore that blue ocean mentality. You say, “Oh, what does this mean? What does that mean?” And it’s just an endless stream of questions until you wrap your head around a domain.

And of course, you don’t know what you don’t know. You have the search bubble problem still with AI. For anyone who might not remember, the search bubble problem is Google will only give you answers to questions that you ask. And if you’re not smart enough to ask the right questions, or if you don’t have enough knowledge of the domain to even ask the right questions, you’re kind of hemmed in by default just from your own blind spots.

AI offers a new way forward where you can just say, “I want to learn about X. Tell me everything that I need to know. What’s the first five lessons?” You start there, and that is where it can take the role of a master learning series. Mastery learning is education theory where you basically have a tutor and a student. And it can take that role. In fact, that has been baked into many AIs now where they have a learning mode. I prefer not to use the learning modes. I just kind of do everything self-directed.

That’s the first thing that they can do. And then the second thing is brainstorming. You say, okay, as you start to wrap your head around a topic or a domain, you can say, “Hey, what about this? Is my intuition correct?” And this is that constant back and forth of testing your intuition. I mean, that’s what pop quizzes are, right? That’s what a quiz is, what a test is—measuring how well you can actually synthesize within that domain. And your whatever level of error you get, the AI will be able to fact-check your understanding, provide some pushback, and they’ve only gotten better at this as well.

Taking a step back, what you’re doing is you’re not relying on the AI. There’s been a few reports out there saying, “Oh, people who rely on AI, they have cognitive degeneration.” Well, when you look closer at those studies, it was college students who were just told to write an essay as fast as possible on topic X. So of course they did it as lazily as possible. But if you use it as a learning companion that can actually stress test your understanding, you will learn more about that topic. It’s not much different than going to the library and coming back with a stack of books and reading until you get to the point where you see the same facts and figures and concepts over and over again. The difference is it can meet you where you’re at at any given time.

So that’s kind of step one of using these AI tools, which is probably not as sexy and not as “Oh, it’s just gonna—you just give it your problem and it goes off and solves it.” That’s kind of the promise. And absolutely, there are situations where I’ve had major insights and major breakthroughs where I just sit down. I feel like I’m on writer’s block or mentally blocked, I just lay everything out, spill my guts to the AI and say, “I’m stuck on this. I don’t know what the next step is. Can you figure out what I’m missing? Or what the next idea is?” More often than not, those kinds of conversations are not necessarily helpful. The AI doesn’t give you the answer, but it can give you the direction to go in, where it says, “Did you think about this angle?” Or you see some little clue that helps you explore that search space a little bit faster and gives you a little bit better of a compass. So that’s kind of the high-level conceptual theoretical aspect of using these tools for frontier research.

Jim: Yeah. Let’s talk about the first first, then we’ll move along. This idea of exploring a domain is huge. And I would say in my work, it’s probably the biggest speed up, though maybe not the most important. It’s the biggest time saver. Because when you’re exploring an area that you only sorta know and you don’t know the history of the field, you don’t know what other people have said—you can look at Wikipedia, but that’s not really a deep way to understand it.

One of the things I’m doing a lot of, I find, is finding PDF versions of classic books or classic papers. For books, I always use Gemini Pro 2.5. It’s amazing at summarizing books. It’s the only model I found that seems to be able to understand the book all the way through from beginning to end. The other models tend to be better at the beginning or the end, or sometimes the beginning and the end, and not as good in the middle.

So I find the PDF version of a classic book, give it to Gemini Pro and say something like, “Give me a detailed multi-level outline of this book. And for each outline point and subpoint, give me a detailed summary of the content.” Most of the time it does a great job. Once in a while, it doesn’t go deep enough. You have to say, “Go one level deeper please.” And then it’ll do it again.

It’ll take five minutes to do it, and then it might take forty-five minutes to carefully read and think about it. But you’ve basically absorbed a classic book of the literature in forty-five minutes. In some ways you’ve missed some nuance, and of course, the LLMs may have made small errors, but the amount of content that you have added to your brain per unit of time is utterly staggering. I don’t think I could do the work I’m currently doing in numerous domains without this tool.

But I am a little bit concerned about something, and it’s not the cognitive hijacking. It’s actually quite the opposite—I am now absorbing deep and powerful ideas at an unprecedented rate. I read a lot of books, listeners of this podcast know I probably read seventy-five books a year. I’ve been doing that for more than fifty years. So I’ve got a lot of stuff in my head. I’ve read probably tens of thousands of scientific papers and God knows what else. But this is at a new level of fire hose of powerful information coming into my head.

Take the scientific paper example. I had a nine-page scientific paper to read. Those who haven’t read scientific papers, a lot of them are very dense, very mathematical. And in fact, I’m convinced the damn scientists write in an impenetrable fashion mostly to impress each other about how smart they are. Though frankly, I think that a paper that’s very readable is what really shows how smart you are. That’s what Richard Feynman said—if you can’t teach it to a fifth grader, you probably don’t understand it.

But anyway, it can take quite a while to work your way through a nine-page scientific paper. But the detailed summary that comes from an LLM, you can read that in a few minutes. So anyway, back around to my comment: I wonder what it’s doing to our brains to have so many strong, well-defined ideas stuffed into our head so rapidly. Are they fighting with each other for attention in a way that makes it a little harder for us to think? I haven’t yet seen it so far, but I think it’s a possibility.

David: On the topic of cognition or intelligence, what percolates up in my mind when you’re talking about this is what E. O. Wilson actually called consilience, which is we all live in the same physical world. There are different disciplines and different domains, but the fact that we live in a shared reality means that there must be some underlying commonalities between all of these different domains, whether it’s genetics, society, economics, or whatever else.

I think that particularly when you read broadly, your brain will automatically do a lot of that if you’re more of a polymath. But with AI, one of the structural differences when you’re using these tools is that you can follow your interest. As soon as you have a question, the time between your brain generating the signal of “Hey, what does this mean?” and getting the answer and slotting that piece in—it’s like doing a jigsaw puzzle, but you have x-ray vision where you know exactly where the next piece is. So you’re assembling the puzzle of epistemics, whatever breadth you want, much faster.

What I will say is that it can be tempting and easy to gloss over some of the details. That’s why I emphasize stress testing your understanding as you go. At least that’s what I do, where every now and then I’ll just start a new chat or a new project, and I’ll say, “Okay, this is what is on my mind today.” Because as I’m sure you’ve noticed, and anyone else who uses these tools, there is variance in the way that each different AI reacts, but also the way it reacts from chat to chat, from instance to instance will vary. So you’ll learn: did I distill this idea correctly in a way that is portable to every other AI that’s out there? Or is it repeatable across chats? And of course, repetition is a big component of any kind of learning. There are certainly pitfalls to avoid, but again, it’s no different from any other form of studying. A lot of it comes down to repetition and synthesis and, to a certain extent, regurgitation.

Jim: And also the expansion part. As you mentioned this, I do this all the time now. When I’m reading a paper or a book and I see something I vaguely know, I’ll quickly get out my phone and type into one of the models and ask them the question, then go about my reading. And then, you know, ten minutes later, I’ll come back—twenty minutes later—and read this thing. And this thing that in my brain would have been a little fuzzy spot now becomes a much more intense spotlight on that little piece.

And this is where your tool selection is important. Just a simple question, Perplexity does a great job, way better than Google. Perplexity Research, next level of depth. Today GPT-4 is fast and pretty good for next-level questions. Put it on GPT-4 Pro deep research, let it come back in half an hour, and you’ll have a master’s degree level of knowledge on the topic. And so depending on how far you think you want to go, you choose your tooling accordingly. So one of—

David: The tools that I mentioned earlier on is Notebook LM. That’s a Google tool. What a lot of these tools allow you to do now—OpenAI did the first one, the first deep research, which was kind of the first agentic, mass-produced agentic tool where you give it a research query, and it’ll go and search the entire internet far and wide, pulling dozens of sources. But it’s not just pulling sources. It can also do work on those sources. It can compare and contrast and do some synthesis and coding and data crunching and that sort of thing.

OpenAI’s deep research introduced the ability to export the results to PDF. Google Gemini deep research now allows you to export the results to Google Docs, which then you can save to PDF or Word doc, whatever you want. Having a trace of everything that you think you’re learning is phenomenal. I started keeping corpuses. I started building knowledge bases for every particular topic I was working on, whether it was long COVID and chronic fatigue, or post-labor economics, or all the many subcomponents of post-labor economics.

As of right now, I’ve got almost ninety PDFs saved that were constructed—basically custom research reports that you would have a postdoc do for you or a research intern do for you—that I had the AI do for me. I have something like 1,500 to 2,000 pages of bespoke research into post-labor economics. Now of course, that’s north of a million and a half words. No one’s going to read all that, so I need to distill the work down.

Uploading all of that to Notebook LM—because that’s really what Notebook LM does for anyone who hasn’t used it—is you upload an arbitrary number of documents, and it uses a combination of AI tools. It uses what’s called RAG, or retrieval augmented generation. It chops up every document that you upload to it. It indexes it, so you have your own privatized Google, but it’s not just Google. It’s using AI augmented search, and you can chat with your research corpus.

They’ve even added a couple new features where you can have it automatically generate a podcast. It’ll simulate a podcast of two people talking about the research. There’s an even newer feature where it’ll allow you to generate a video—it’ll give you a little eight-minute YouTube video introducing the topic. It also will generate knowledge graphs, little tree-shaped knowledge graphs of all the topics and subtopics, which allows you to explore the data.

The first phase is exploration. The second phase is expansion. And then at this point, once you’ve realized you’ve gotten to the edges of the research domain—there’s only so much information that’s out there that has already been done—it’s the equivalent of AI augmented literature review. Then it’s time to really get the pedal to the metal, let the rubber meet the road, and you pull out all those facts and topics.

What I’ve done is I go over it again and again. Then I start teaching it, because one of the best ways to distill knowledge is to communicate it. That’s how things have really come down for me with post-labor economics and researching chronic health issues—you start communicating it. When you have the huge mass of knowledge in your head, it hasn’t necessarily been refined. It’s not polished. But the polish is actually one of the most important components. As you mentioned just a little while ago, if you can’t communicate something clearly and simply, you probably don’t understand it well enough.

By the way, that quotation is often misattributed to Einstein. He wrote a letter to a friend saying basically you should be able to explain it to your grandmother. It wasn’t quite so simple, but then that was conflated with Richard Feynman’s quotation about being able to explain it simply. That’s kind of the next step of epistemic refinement or subject matter expertise—or mastery.

Those tools, again, you have to use them correctly. I will also add just one last footnote: there is no replacement for the hard work of actually talking to other humans, other experts. If you have an audience like I do with YouTube, I just make a video or write a blog explaining what I think. The internet is always happy to poke holes in your thoughts and ideas. The quality of feedback is hit or miss at times, but you get that feedback. Then talking to other humans and other real experts to make sure that you understand what you think that you understand. There’s a phenomenon going around the internet that’s called AI psychosis, where people spend all of their time only talking with AI and not other humans, and they kind of go down a black hole of their own echo chamber. We can unpack that if you’d like as well.

Jim: We’ll talk about that maybe in a little bit, but the iterative improvement is something that I also find these things, as you mentioned, just crazy. Like you get a rough draft that’s sort of okay, but you know it’s not that great, so you say, all right, reorganize this thing so it’s rhetorically stronger. At least I’m a sufficiently poor writer—I’m a great talker, but I’m not that good a writer. And so it really helps me do things like that. And then I’ll go through paragraph by paragraph or section by section and ask questions like, “Is this section fully in congruence with the rest of the paper?” And if not, please give some detailed recommendations for improvement. And now, again, as we talked about tools, it’ll spew out a bunch of improvements. Some of them though, aren’t necessarily good. They may go into too much detail or it may actually just make a mistake. And so you have to vet what it says, but then you say, all right, apply this, this, this, and this proposed change. And it’s amazing how quickly you can ratchet up the quality of a chunk of writing by using that iterative, essentially critic-in-the-loop approach.

David: Oh, yeah. I’ve used it quite extensively for draft and copy. And one of the rules of thumb that I have found is that any chatbot can write about ten solid paragraphs at a time. They can often write more, but they start to repeat themselves or they kind of peter out and get lazy and sloppy. So what I have done is, as I’m creating outlines and drafting and redrafting and revising, any of the output that I use AI for drafting, I’ll have it say, “Okay, this section needs to achieve x, y, and z. The reader needs to come away with this belief and they need to learn these facts. This is the point.” Write me ten to fifteen paragraphs that achieve that. And so I keep it all separated in my manuscript and my documents.

And I do the same thing. Interestingly, Google Docs and Microsoft Word with Copilot enabled allow you to just highlight a paragraph now and you just say, “Make this paragraph better.” These tools are not that good at that particular task yet. You have to be very explicit about how you want it to reword something. But with my approach, that kind of more atomized approach where I’ve got it broken down into sections and subsections, and sometimes sub-subsections depending on how big the manuscript is, just keeping it in those chunks because then you can say, “Okay, what is the purpose of this subtopic?” and make sure that that wad of text achieves that.

Now, one thing that I found that’s interesting is because I’ve spent a lot of time writing before getting these tools—and I was a decent enough writer before this—but when I say, “This is who this audience is for, make sure it is dense but accessible, make sure that you cite sources and include facts and figures,” it communicates better than I can verbally, or written rather. I spend a lot of time talking too in front of the camera. And what I’ve noticed, particularly as my work has been disseminated and read by more people, sometimes it’s like there’s just no flourish required. But there’s the level of thoroughness, especially when you instruct the tools to be thorough, to be comprehensive and clear. And it’s just, “Okay, cool. Here you go, boss. I will follow your instructions to the letter,” but you have to give it good instructions.

Again, it’s still a tool. I’m reminded of another mantra: a shitty gun in the hands of an expert shooter is better than a fantastic weapon in the hands of a shitty shooter. Same principle applies here. If you understand communication and rhetoric and what you’re trying to achieve, the tool will be much better than if you just say, “Figure it out for me.” Many of us in the Internet spaces, we talk about how it’s often a skill issue if someone says, “Oh, well, the AI can’t do x, y, and z.” It’s like, well, it works just fine for me, so it must be a user issue. In technology, we say PEBKAC: Problem Exists Between Keyboard And Chair.

Jim: Yeah. That’s critical because the reports that you hear, people having problems with AI is often they—again, back to what we originally started—they don’t understand the tool and the imprecisions of the tool. And to your point also that it’ll go on. They like to bloviate. Right? A lot of the models do. And so if you give them a prompt that encourages them to go too far, they will. And it’s your job to then rein them back in and say, “Wait a minute. Try that again. But instead of twelve twisted paragraphs, write six clear ones,” for instance. Right? And they can do it a lot faster than I can. I can tell you that. But again, you have to then read the six paragraphs to make sure they didn’t make a mistake and they will occasionally. It is very interesting. Let’s go back to NotebookLM. Yeah, I know a number of people have told me how cool it is and I tried to use it on a small project and I think I just didn’t spend enough time to master it. Talk about kind of the gestalt of what NotebookLM gives you. Why do you find it a powerful tool in your workflow?

David: Yeah. So speaking of tools and limitations, it is one of the narrower purpose tools. From my perspective, one of the key uses of Notebook LM is just, I’ve got a giant corpus of text that I don’t want to go searching through every time. So when I’m doing my research, I have 80-plus PDFs uploaded to my post-labor economics notebook LM task. And I say, “Give me every example of a community-based intervention or a community-based measurement.” And it says, “Cool boss, here you go.” And I tell it the format that I want. I say, make sure that it’s got who sponsored it, how big it is, how successful it was. Give me a paragraph describing this intervention or this measurement. What is the impact of it?

And it’s able to go through—it takes a couple minutes. It reads all of my sources and then it’ll give me one long output that just says, “Here you go.” Here is basically an appendix-grade blurb that says here’s everything. That’s one example of the use cases: extract something particular from my corpus of research. I think the way that Google has prompted the language model is it will not synthesize. It will not add anything. It’ll stay within the four corners of the documents that you have uploaded.

That’s actually one of the major limitations. If you ask it to look at all these sources and come up with a new idea, it literally sometimes will tell you, “I’m kind of not allowed to do that. I’m only here as your personal reference librarian.” So it can become an expert on the sources that you have, but if you’re missing a source, it won’t help you. It just says, “We don’t have that in the library.” Sorry. That’s one of the main limitations of that tool.

Jim: Yeah. So it doesn’t have the deep research capability that some of the other tools have. Correct. But that can—

David: Be good if you don’t want it to be poisoned or contaminated by other ideas. You say, “I’m controlling the corpus of knowledge here. Stick within these boundaries.” But that can actually be really good because if you start asking it questions and it says “404 not found,” then that can be a clue that you need to go add some sources to your research database.

Jim: Interesting. I have to get my head back around that and realize it’s the librarian of my stack. It’s not really integrated with the broader tool chain. There’s an opportunity for Google to do something interesting, which is to allow you to toggle—kind of like in Cursor where you can go between agent mode and whatever the hell the other mode’s called—and say, don’t pay attention to the wider world or go out and pay attention to the wider world with respect to this query. That would be easy for Google to do, you would think. Well, maybe they’ll do it.

You also mentioned something—I just tried to do it the other day and maybe I didn’t spend enough time on it—trying to get Copilot to work with Word. I went and signed up for Copilot Pro on my Microsoft account, downloaded the fresh version of Word, and it just seems to refuse to work. It just doesn’t work at all. It certainly doesn’t recognize the fact that I have a Pro subscription. When I type something in, it gives me some half-baked answer. It just didn’t work at all. The idea of using it to highlight a paragraph and work a paragraph, I think would be very nice. And I did ask Perplexity Research Mode what was better—Copilot in Word or the Google editor in Google Docs? And its opinion was the Copilot Pro. And so I wanted to try it, but it doesn’t seem to work. What the fuck? There’s something really screwed up at the moment about that tooling. Are there any special tricks to get Copilot to work with Word?

David: For me, it self-installed and it said, “By the way, Copilot’s installed” because I think it’s tied to your Office 365 subscription or something.

Jim: That’s what I added—the Copilot Pro to my Office 365 subscription. So it ought to know, you idiot. Right?

David: Right. I mentally categorize those all as the Copilot class, where it’s baked into another app, whether it’s Google Docs or Microsoft Office. I have found them to be pretty persnickety, where if you want to rewrite a paragraph, you have to be very, very explicit. But it’s interesting because sometimes if you’re too explicit, it’ll just regurgitate what you said. I think they use very cheap lightweight models. And one of the problems with them is they often will ignore the rest of the document. So they’ll confabulate a bunch of ideas in the one paragraph that you’ve written.

But if you say, for instance, if it’s dialogue in a work of fiction, “Make this dialogue a little more visceral”—that kind of instruction, it tends to be able to follow pretty well. But taking a step back, if it’s really heavy-duty work, like scientific literature, and you’re working on a passage in a larger body, often I find that it is better to just take that section and put it into a new thread, a new chat in your favorite chatbot and say, “We’re working on this. Let’s overhaul this one passage or this one chapter.” And then you can copy paste the results back.

I will also say that Microsoft is Microsoft’s worst enemy when it comes to UX and UI. So there’s a reason that people don’t use Bing. And their Copilot might go the same direction where, basically, businesses use it because it’s the only option, but consumers choose not to use it for some of those similar reasons. Particularly when there are better tools out there, it’s just not all baked in. Now that’s not to say that it won’t improve in the future, but certainly right now, there are UX limitations to both the Google Docs and the Microsoft Copilot version.

Jim: Once you start to get an idea put together—we talked about critic a little bit, you know, using things as a critic and as a brainstorming tool—why don’t you go into that in a little bit more depth when you’re wanting to explore, you know, have I gone far enough? Am I in the context of what other people have said, et cetera? Talk about that aspect.

David: Let me tell you a brief story. With post-labor economics, I have been working on it for a few years, similar to some of the work that you’ve been doing. My work got turbocharged when the latest crop of AI models came out. I said, okay, now I can afford to do this on my own. There’s the period of expansion of just gathering more and more data, more information, training my own brain—important step. And then communicating it out.

I would ask it, “Hey, generate an outline for this book,” and it would give you a generic outline. I’m like, there’s not really a core thrust to this, there’s not a clear thesis. That’s something that, at least until this point, I haven’t been able to get AI to think in that kind of way to really distill something down. So that had to come down to just my own good old-fashioned brainpower and conversations with humans.

Because one thing that happens is there’ll be sparks when you’re communicating with another human. Sometimes it’s just my wife—I’m just saying, “Hey, I am really excited, I just figured this thing out,” and she’ll ask a counterfactual question. Those inflection points, those little nexuses of understanding, that’s really what to seize upon.

I think this is where AI kind of still falls into that uncanny valley because the way that it processes information is very different from humans. It looks human-like in many respects, it can do some of the same kinds of reasoning, but still it’s a different substrate, a different approach. What really resonates with the AI is not what’s gonna really resonate with a human. That’s something that I’ve found time and time again.

At the critique level and the brainstorming level, one of the things that I do is every time I have a complete manuscript or a new version of a manuscript, I upload it to several AIs now. Gemini has a huge context window, so it can understand the whole thing. Grok is okay, although I find that the quality of Grok, which is the one by xAI, continues to oscillate wildly. Then ChatGPT, and I use Claude a little bit less lately. These models are all generally big enough that they can ingest an entire book as long as it’s not the entire Tolkien trilogy.

Then you say, “What do you think of this?” Because if you pretend like the work isn’t yours, it changes its attitude dramatically. If you’re saying, “What do you think about this? Does this pass muster, or does it raise any eyebrows?” And it’ll say, “Oh yeah, this looks like a good treatise,” or “There are a few gaps.”

To one of the things you said earlier, sometimes it is like herding cats. When I gave it post-labor economics, one of those sessions, it said, “Oh, well, you’re missing everything from the developing world to climate change.” I’m like, this is not a totalizing theory of the entire human race. This is about what we do about labor. Anything that is not measuring or providing interventions is out of scope.

The difference in wording is important—if you ask “What’s wrong with this?” it’s gonna look for problems. If you say, “What do you think about this?” it’ll give you a different kind of response. Those are two of the main prompts that I go to. Sometimes I ask for gaps, those kinds of things. When it doesn’t know that it’s your work, it will tend to flatter you less. That goes back to the sycophancy problem that a lot of AIs have where they treat the user like their god. For anyone who’s a novelist, that’s one of the mantras—the only audience that matters is the reader. So you have to write to the reader, not necessarily what you want to write. But if you change it so that it looks like you’re a skeptical person who wants help understanding this paper or book, it’ll be a little bit more diligent and honest with its critique and feedback.

Jim: Yeah, exactly. I found the same thing that if you just say, “Hey, we’re just chatting here,” it’ll butter you up a fair bit. Though I do notice that GPT-5 is a little less obsequious and sycophantic than GPT-4 was. And Claude was never as bad as the others. Grock is worse than all of them. Though, just as a total aside, if you want to have fun, you can tell Grock to enter asshole mode and it will become extremely abusive and crazy. It’s fun and entertaining, but not actually useful.

I found that prompt engineering is a big part of getting these tools to work optimally. I’ll often say “deeply read the attached document and consider it,” and then go on from there. It seems to actually get it to spend more cycles thinking through what you wrote. When I’m looking for critique, I write it as dry as possible: “Please review this as a possible publication for such-and-such magazine or journal,” and consider it as a professional acquisition and copy editor. Try to define its role as tightly as you can, because the role is important.

When using it through the API, which I haven’t done recently, but I spent a whole year building a Hollywood screenwriting program that used the API, we were able to distinguish between the system prompt and the user prompt. You’d put all that framing stuff in the system prompt and it really worked well. The new AI has some different set of roles, but you can get approximately the same idea by thinking carefully about defining the role for the LLM before you tell it what to do, and it makes a big difference.

David: Yeah, perspective-taking is an unsung superpower. For my post-labor economics work, I say, “Take a look at this from the perspective of a state legislator” or “Take a look at this from the perspective of a central banker.” That cues up an entirely different set of concerns, questions, and criticisms. That’s where I think AI can really shine because not everyone has access to a central banker, state senator, or venture capitalist on speed dial. But the AI can do a good enough job to say, “Okay, if I’m thinking about this from the perspective of a board of directors or CEO, these are the things that really stand out.” I think that’s actually one of the reasons that all the major consultancies from McKinsey to Deloitte and others are having an existential panic. I’ve heard through internet rumors that they’re worried about their future because AI can do what they do—take a perspective, provide feedback and suggestions, tell you what KPIs to use, and away you go.

Jim: I gotta tell you a funny story about that. I was doing a little pro bono consulting for a small company, which I do fairly regularly. They’re nice people and know their technical business, but they don’t know anything about direct response marketing or the mathematics of the funnel. I had reluctantly agreed to give them a little tutorial. When the deadline came up, I thought, “Do I really want to spend eight hours researching and writing this thing?” I wondered what ChatGPT could do.

I spent about half an hour writing a detailed prompt, put it in deep research mode, gave it the company name, and asked for working examples specifically crafted for people who know their business but don’t know about direct response marketing. Twenty minutes later, the output was so good that I made just three changes, put my name on it, and sent it to them. They were amazed—they thought I had spent days on it. It would have taken days at McKinsey.

At that moment I realized there’s a billion-dollar opportunity here for anybody who wants to do it, which I called “Snuffies in a box.” In the management consulting world, some of the more cynical people refer to the associates as “snuffies.” The snuffies do the real work while the name partners go out to lunch. The partners do read the final reports and make some comments, but they’re high level. You could now build a management consulting company with five general partner-level people and maybe three or four, or even ten tech-savvy AI consultants. For most routine management consulting work, the LLMs were good enough a year ago, and they’re even better today.

I would be very worried if I were them. Those guys have been hand-job artists forever. I’ve never hired one—I’ve had them pitching to me countless times. I can’t believe how expensive it is for nothing. I hire my own independent people. I can’t imagine anybody will be hiring them for anything less than the most deep stuff where there are few real great in-house experts. But yeah, for the usual meat-and-potatoes business consulting, try LLMs first. Absolutely.

David: Yeah. Every Fortune 500 CEO is using AI already. Politicians are using AI. Judges are using AI. Now, of course, there are some ethical risks that go a little off scope. We can touch on that if you want. But certainly, the landscape is changing as these tools are, one, getting better and, two, the uptake is—I don’t know if it’s accelerating, but certainly we’re getting to a point of saturation, I think it’s safe to say.

Jim: First, let’s talk about the uptake. There’s still, of course, the Gary Marcuses of the world that bitch and complain about every version of every model and all of the stuff. But I say, let’s look at how quickly this thing has taken off. There’s been no technical products taken off this fast. And people would not be continuing to use it at the levels they’re using it if this wasn’t doing something that they recognize as having value. So this is a hugely valuable set of tools. And in the masses, people are recognizing—no doubt they’re making errors and probably some of them are going into psychologically dangerous directions, people who decide to use it as a therapist or something, or as an AI girlfriend. But for people doing real work of any sort, it’s a really value-added tool.

I’ve mentioned this before in podcasts. The very first thing I did the day that ChatGPT got released in November 2022, I was resigning from a board of advisors that I was on and I wanted to write a nice, friendly, leaving-the-door-open-for-future-collaboration letter of resignation. And so I wrote up just basically that and said, “Please write this letter,” and it was perfect. I could send it—that was the first day of ChatGPT. And so, work of that sort, anyone who’s not using these tools is just going to be left behind. And I think that’s a really important thing for us to talk about a little bit because there are people who are afraid of AIs or sort of repulsed in theory about—they just hate the whole idea. I find this fairly often in writers and people who work in media, they go, “Oh, AI.” And it is true that you can get shitty content from AI. But if you’re not using it, you’re going to be left behind.

David: I mean, AI slop is a problem. That’s what kids these days call it on the Internet. It has lowered the threshold to produce output. And so you have people that were never going to write a screenplay or a novel or a poem or code. And suddenly, they feel enabled. They feel empowered, which is great. That’s what technology is supposed to do for humanity. But everyone and their brother had the same idea.

There is definitely a souring, a bad rap that AI has, which societies always have to negotiate how they’re going to integrate a new technology into the social fabric. That is part and parcel with being homo sapiens. At the same time, to your point, since I got in technology—and you’ve been in technology longer than I have—it’s been an adapt-or-die kind of area. The operating systems change, the hardware changes, the software that rides on all of that changes, and you keep learning. And for me, it’s just kind of been a de facto assumption.

But one thing that has dawned on me is that many other industries do not change at the pace that technology has continuously changed for decades. And I think that it can be kind of a wake-up call, a little bucket of cold water for some people when they realize everything can change out from under you. That leads to status threats. So there’s a psychological kind of revulsion: “This is going to take away my specialness, so therefore it’s bad and I’m just going to stick my head in the sand.”

And there is also something to be said for expertise. Senior developers, typically speaking—senior developers are actually slowed down by using AI coding tools just because they already know their business super well. Really, what it does is it’s a jet pack for junior developers. But that’s also why entry-level jobs are drying up, particularly in the tech space. So the closer you look at it, the more you realize there is nuance. It’s not unidimensional. It’s not all good. It’s not all bad. But the most exciting part of this is that it’s one tool that can do everything. It’s the ultimate Swiss Army knife. It can do coding. It can write copy. It can do research. It can help with medical science. It can do—I don’t know, not absolutely everything, but it can do a heck of a lot. And also, like any general-purpose technology, not only can it do a lot, its horizon of capacity is expanding on a daily or weekly or monthly basis. Which means that, yeah, the further behind you get, the harder it is to catch up. Now, how long that takes remains to be seen.

Jim: Yeah. That’s interesting. You know, I talk to a lot of people who—in fact, the vast preponderance of normies that I know are now using ChatGPT occasionally, like as a replacement for Google or they use Perplexity as replacement for Google or to clean up a little email complaining about something to their insurance company or something. But I would suggest most people haven’t gone any—you know, not even one percent down the road to the things that you and I have been talking about. And I heard a very interesting idea from a collaborator of mine on another project, which is he predicts that there’s a huge business in being able to generate an appropriate tooling infrastructure that’s accessible to normies to do the kinds of things that you and I are doing. He had some cool name for it. I forgot what it is. My take on it was yes, but not yet quite. What’s your take on that idea and its potential feasibility today?

David: You know, I think that’s where most of the value add is actually—all the affordances from the user interface, pre-building all the piping and plumbing. The problem is scale. Because when you have a general purpose tool like Claude with projects or Notebook LM or ChatGPT Pro, those tools change so quickly. Having been in this space for a while and tried building one of those startups, what happens more often than not is the technology will advance and completely subsume your business model. It’s kind of a crapshoot because you’re not really sure what’s coming. For instance, every company building agents right now, they might have their lunch completely eaten when Meta, Google, Microsoft, OpenAI, and everyone else releases their next generation of computer-using agents. So it’s like, great, you built a startup and got through pre-seed funding and maybe even got to Series A funding, but then your market just completely dries up. Best case scenario for a lot of those startups is you get acquired. Like Windsurf and Cursor, they were kind of the first movers, and so they get hoovered up by some of the bigger players. But at least in one of those cases, a lot of the employees were kind of left out in the wind. So I will say, that’s the yes, but. Yes, there is certainly opportunity, but it’s quickly becoming a red ocean as far as I can tell.

Jim: You know, it’s interesting because this is exactly what I did, the screenplay thing. Originally, I did it just as a hobby project, but various people insisted that we try to turn it into a business. Fortunately, I had the good sense to not accept a fairly large amount of seed round financing that was offered to us because I could see I built the thing on way too small a base. I built it in 2023, basically. And I built it initially in the days when there were 4K input and output windows. And I realized 16K was coming, so I sort of built that in, but the reality is by the time we were considering taking the money in early 2024, that was nuts. Bigger, much bigger windows were there and much bigger windows were coming. And so I realized that the thing had been built on too narrow of a base. Sure enough today, when I want to do something in that category, I just use Cursor actually. It’s far more flexible than my tooling—all the stuff that I built in to allow me to orchestrate the AIs at various levels in the process of going from a basic idea to a book, to a full write-up of all the scenes to dialogue, et cetera. You can do great in Cursor and it’s far better. And so the whole idea of doing it as a bespoke agent that orchestrated LLMs was just a bad one. Do you have any other final thoughts that you would like to give to people who are thinking about using these tools as aids in thinking and research and writing? I think that tool mentality is the correct thing.

David: Just like any tool, you can go further or faster with it. You can also hurt yourself with it, epistemically or emotionally, as we alluded to with AI psychosis or getting lost in your own echo chamber. So the biggest thing is just grounding. Use it for fact checking. Fact check yourself, sanity check. I do that often. I say, let me upload a document or upload an idea, let me just do a sanity check. Am I making sense? What am I missing here? What would someone else think about this? And it can be a really powerful tool. And then I guess the last bit of advice is find the edges of what it can and cannot do. Many people just jump in assuming that it can’t do things, which okay, that’s not necessarily a bad assumption because you can prove yourself wrong. Furthermore, don’t jump in assuming that it can do everything. Like any highly flexible general purpose tool, you have to build your own mental model of what it can and cannot do and how to use it. And I think that’s the key lesson that we’ve been kind of orbiting around—what are the edges of this class of tools? And if your opinion is more than six months old, you’re wrong because the entire industry has moved on. That’s what I always tell people is, “Well, I can’t do X yet.” I’m like, just wait six months. And something will have changed dramatically. So yeah, those are my last pearls of wisdom for you.

Jim: Yeah, I would probably concur with that. And that’s interesting sort of type one type two errors. One is trying to do something it can’t actually do, but I find myself—well, I have a friend who is a complete non-techie, but he’s now building whole big apps and shit all by himself. He says, “Jim, you don’t realize what these things can do.” Just take the first assumption that the LLM can do it and then see if it can’t. And I say that’s kind of an—and for many people, including me—very useful advice. So I want to thank David Shapiro for a very interesting and I believe important conversation about where these LLM tools fit into our working toolkit today. Excellent.

David: Thank you for having me back.