The following is a rough transcript which has not been revised by The Jim Rutt Show or Timothy Clancy. Please check with us before using any quotations from this transcript. Thank you.
Jim: Today’s guest is Timothy Clancy. Timothy is an assistant research scientist at START at the University of Maryland. START is a kind of cool name—its actual name is Study of Terrorism and Responses to Terrorism. That sounds like some hot shit. Timothy specializes in studying wicked mess problems, including violence and instability as complex systems. For over thirty years, Timothy has helped stakeholders in all manner of organizations understand their wicked mess problems and work towards resolving them. Welcome back, Timothy.
Timothy: Thanks for having me again.
Jim: Yeah, this will be great. And if your work is involved in unfucking wicked messes, you will never lack for work.
Timothy: Blessed are the problem solvers.
Jim: As I mentioned, that was a welcome back here for Timothy because he’s been on twice before. Back in Episode 257, we talked about Russia’s mid-game very soon after the invasion of Ukraine. And I have awarded Timothy the prize for best prognostication. Of course he cheated by having five scenarios, but nonetheless, his were better than the other people in those days who thought Ukrainians were gonna get their ass whipped quick. And Timothy said, no, not necessarily. So he got the Ukraine War Cup from The Jim Rutt Show. We also had him back on EP 248, where we talked about the Israeli and Hamas war. So if you want to hear more about his wisdom, check those things out.
Today we’re going to talk about AI and is AI headed for a new cold war and more broadly, what are the security implications, I would say, for the US and for the world with respect to AI. I am referring at the moment – I sigh when I said AI, people have heard me say I hate it when people use AI as a synonym for LLMs, but it’s become so ubiquitous that will probably be the headline. So we’re talking here about transformer-based deep neural network types of AI, at least at the moment. Some of these things will apply to other forms of AI because I’m with LeCun and Jeff Hinton that deep learning and deep neural nets are not the end of AI and are probably not enough to get us to AGI. But for now, we’re talking about LLMs and the related technologies. So with that, give us your very top level framing of what are the implications for these AIs within the strategic context?
Timothy: I appreciate you calling out that LLMs is sort of what people know. And part of what’s happening is the evolution from simply chat-based generative AI using LLMs that are simply looking at you put in a simple prompt, you get out a simple answer, and there’s some details that go into that, but now we’re getting into reasoning AI. AI that is effectively chat-based GPT, but doing multiple iterations and remembering what its context is, and it could be doing a thousand parallel runs or looking at things in pause and trying to reason through a problem.
And where this matters for national security, there’s already in other forms of AI, which you are familiar with – I’ve seen really good models for how ships in the Navy calculate ballistic missile strikes and back-and-forth packages, how you deal with mass casualties, things that I would say there are fairly standard models of. And by standard model, I mean, we know how physics works. We have a standard model of physics. So we know what the arc of a missile is gonna be. We know the tangible elements. We can just compute that really fast.
But where AI is going with these reasoning models – and I wanna be very clear because we’ll probably agree, it’s not obvious it’s gonna get there – is in really complex national security issues. So topics of social complexity, topics of systemic complexity. We’re talking about how do decision makers think, act, and respond, not just the missile launch itself, but what goes into the individuals making that decision to launch a missile or respond when it is launched, what goes into societies that are under invasion, how long will their will to fight survive, will it sustain – and like you mentioned, Ukraine, they held out and they’re still going – or will it collapse and there’ll be a capitulation point.
These are topics that are a lot trickier to deal with because there is no standard model. There’s no standard model, as you know, of a complex system. There’s no standard model like we would look at physics. And so what’s gonna happen, and we can talk about this cold war breaking out, is there’s really this race now. And it started kind of softly with the first reasoning model of ’03, but really took off when DeepSeek came out with its R3.
And it showed that basically – whether or not, and I’m just gonna put a coda on this so we can go back and deal – whether or not AI ultimately gets to the ability to do this, right now, neither the US nor China can afford to fall behind because the consequences – and I use this analogy: The first person to get the bomb and the second person to steal the bomb set the pace for the twentieth century. And I heard that from an expert that I’ve been interviewing recently in the research.
This AI potential with the potential out there, you can’t afford not to get involved. And this has some really big implications, not just for national security, but chip production and power and how we use these technologies, maybe integrate them with model and simulations, and what can and can’t be done. But I really believe, and I have an article out there on this on the blog and a more formal one that’s about to go into peer review, that this is the start of a new cold war because unlike the nuclear bomb where we had it and it worked, this is now the belief in the potential. This is like 1938, ’39 where people are speculating. And once you realize what the potential might be, you can’t afford not to get on this cold war arms race, Red Queen race where you’re just running as fast as you can to try and keep up because the consequences might be dire if you don’t.
Jim: Yeah. In fact, Vladimir Putin, we all heard of that guy. He said a couple of years ago, maybe four years ago, think before the Ukraine war, whoever achieves AGI first will control the world. We can argue about that all day, but even a guy like Putin who probably barely knows how to operate a calculator understands the strategic possibilities around very strong AI or AGI or ASI, that’s artificial superintelligence, what comes after AGI, which is essentially human level AI.
Timothy: Absolutely agree. And no disrespect to Putin, although I actually kind of disrespect him a lot. Russia’s not a player here. They’re lucky to have a calculator right now if they haven’t already shipped it off and used it as a component part. The two big players here on the geopolitical stage are gonna be China and the United States. And I think the reason this has come up in the last month is, you know, when we had the bomb in the forties, we thought we’re the only one. We had this kind of buffer. You have a sense of, like, well, we can take our time. We can maybe look at how things develop, we can maybe take a pause.
But when the Soviet Union tested its bomb, and we’re like, uh-oh, they’ve caught up. That’s a whole different conversation. And what we thought, I believe, going into the end of last year, beginning of this year is we’re just getting into reasoning models and these Chinese firms, and we can talk about a few of them. They had a long history, it turns out. Very good designers, very good data center management. They’ve done this kind of work in other areas, but we thought, well, they’re behind the curve where we are in AI.
And effectively, the High Flyer, which is the firm behind DeepSeek, they came out with the DeepSeek R3 model, and it matched the O3 model in reasoning. And now, again, I wanna be very clear. Reasoning AI models are not, to your point, superhuman. They’re not even necessarily human yet. They’re very interesting. They got a lot of potential, but they’re along that evolution now where you do have an AI generative AI model using those transformers and those neural nets to do reasoning tasks, and the next one after that might be autonomous agents. And the gap that was supposed to exist between U.S. and Chinese firms from a technical standpoint is gone. And we learned that and we can talk into detail here if you want. The export controls on the GPU chips that was thought to keep China in that gap, it works in one way, but not in another. So there’s now, all of a sudden, just like what happened in the forties, two countries, geopolitical forces, and there’s lots of allies involved in this or at least allies as of today when we’re recording this, that are wrapped in this, but it’s really between the U.S. and China. And it has a flash point in Taiwan that plays not just a traditional conflict point, but a key point in the technology of AI itself.
Jim: Yeah. We’ll get down into all that. Before we jump into it though, I do wanna point out to the audience – many of whom have heard the phrase “multipolar trap,” where players are caught in a Red Queen or a race to the bottom or arms race dynamic and neither can pull out. This is unfortunately a fundamental attribute of competitive dynamic systems. And this is a classic example of that. It may well be that both China and the United States, if we could, like in prisoner’s dilemma, decide to cooperate, we’d both end up being better off. We apply all our R&D to peaceful uses and our societies prosper. But if one side is applying it to potentially military considerations, the other side is forced to by this arms race dynamic. And that’s what really locks this in such that neither side can pull away. And smaller players like Russia, Iran, Europe to some degree – they need to adapt to that, and we’ll talk about a little bit later. They may be able to because the environment is not nearly as obviously polarized into a small number of players as the nuclear standoff was in the late forties, early fifties.
Timothy: Yeah, it’s interesting. So my master’s and PhD are systems dynamics and insurgency dynamics for the master’s. The PhD is system dynamics, the violence instability of nonstate actors. But these dynamics, right, these complex systems, the Red Queen race, the arms race, that is the archetype we’re seeing here. And for people who aren’t familiar with the traps and all that, you got two negative feedback loops that as one side gets a little bit better, it creates an adaptation on the other side that can’t afford to lose that relative advantage. And so it forces the other party to catch up, and then the perception of catch up or even a little bit ahead forces that first party to adapt. And now they’re continually evolving and adapting, and it’s a Red Queen race just like from Alice in Wonderland. You have to run ever faster and faster just to stay in place. And this can become hugely expensive because these, as we’re gonna talk about, what’s driving this right now are power and chips, and that’s where the strategic dynamic comes into play of who has access to power and who has access to chips. And that’s gonna be the tension, I think, at least within the generative AI space. Maybe some fancy person or institute will come up with a new, like you said, a new ASI method or new method. We haven’t seen the end of the innovation. This is still emerging technology. But as long as this current LLM generative AI scaling hypothesis approach is in effect, we’re on this Red Queen race that’s gonna drive this cold war.
Jim: Yeah. And it’s not a simple dynamic either. And the recent R-1 thing is a good example of that. Here, the United States was sitting pretty thinking, we’ve choked off the big chips from China, right, through our chip embargo. And probably the Chinese didn’t get around it – there was suspiciously large amount of chip sales have been going to Singapore the last few years. But let’s assume they didn’t. Well, if what they told us is true and people are working to replicate it, so we’ll know soon if it was true, but they think they’ve already done it. They figured out a vastly less computationally expensive way to essentially parasitize existing models and create a model that’s just slightly behind the state of the art. It’s thought that they probably parasitized ChatGPT-4. And instead of hundreds of millions of dollars, they claim they spent $5 million on training the model, which completely changes the dynamics. North Korea can afford $5 million to build a near frontier model. Bosnia could afford $5 million. So it’s as if instead of building a nuclear weapon being a many year, many billions of dollars problem, building a nuclear weapon was, you know, what a guy that owns three gas stations could afford.
Timothy: This is where I think it gets very interesting because it’s the difference between train and implement. So I’ve got an article up on my blog called “The DeepSeek Moment” where I go into this in detail. If you think about these AI systems, traditionally, all we’ve worried about is the training. And that’s where the scaling hypothesis – more chips, more data, and that ultimately means more power will give you better results. And so the export restrictions were supposed to maybe prevent this model training phase.
But model training is only part of what’s gonna generate the impact. Right? You mentioned the nuclear bomb. If you got one nuclear bomb and nothing but a truck to deliver it, no second nuclear bomb, no submarines to deliver it, you’re not implementing. And so training and implementing is where this begins to diverge and what’s changed in the power landscape with these reasoning models. And you did catch me – it’s the R1 that’s the reasoning model. It’s built off their V3, which was their chat model, which parasitized or used diffusion from the open source model as a base. You’re right. That was a good call.
The split between model training and implement is the difference between sort of designing a new airplane, like the Wright brothers designed a new airplane, and implement is scaling it to have impact. This is getting all the airports, the passenger lines, cargo shipment companies. Now in national security, maybe the model design is coming up with that new fifth or sixth generation fighter. But implement is the ability to make a fleet of fighters. You integrate it into your air superiority. You have all the maintenance lines, and this is where that export control restriction really plays a role.
Now we have to hand it to DeepSeek. This is actually the company name’s High Flyer. They were a high-frequency trading firm that had a lot of good data science and chip data center experience. They’re able to leverage that. They obviously used the diffusion of the OpenAI model as a starting point. They also did some cool things down at the level of the assembly code. They went down in the chip, and they optimized these chips in ways that people who didn’t have less powerful chips didn’t think they needed to.
When you put an inquiry and an inference – when you ping an inference to an existing model, let’s talk about train first. Scaling hypothesis: these chips need to be close together. They need to talk frequently. You need a data center that’s consolidated power. This is the big data center, the big power, the tens or hundreds of millions of dollars you talked about to train this model. That’s training.
To implement, the context of an inference is only the amount of tokens it takes to put the prompt in. It does a little humming, gives the response back. In ChatGPT, it’s not holding that. The input and outputs maybe a four-to-one ratio, but it’s not holding a lot in its context. Now as context and token length has increased, you can put in not just a paragraph, but you can put in a chapter. Think Gemini can go up to a book – it’s like a million tokens.
Cloud computing now can be used distributed because you’re serving all these little inference requests from users. The problem is as you shift from chat functions to reasoning functions, that context begins scaling quadratically. If you put a prompt into reasoning and it’s gonna do 10 iterations of reasoning on that prompt, it not only has that first context, but it has every subsequent context that has to hold in its memory until it gives you the response. And they’re not doing 10 iterative steps – they might be doing hundreds or thousands.
Delivering an inference for ChatGPT costs pennies on the dollar. I can subscribe and give you a thousand inferences for pennies on the dollar. A reasoning generative AI might be $4 to $5 per inference. That’s an order of magnitude higher. And as we get to autonomous agents, which can take multiple streams of reasoning and interact with all the sensors in the world getting data and do things in the real world – that’s the next horizon coming after reasoning. It’s suspected that those ones, to do an inference to an autonomous agent, might be thousands to tens of thousands of dollars.
This is why you see some of these ads out there that are advertising PhD-level assistance for $20,000 a month. The reason is that in the past, inference was cheap. No one really cared. But now inference is getting just as expensive in chips and power as training. And DeepSeek, they had enough chips to train, and they signed everyone up, but they didn’t have enough to deliver. Their ability to serve those inferences dropped off. The capability crashed. They claimed there was a cyber attack, but you know?
So what the export restrictions did – we can’t stop China from developing a high-speed AI, like you said, or North Korea, but we can limit how far they scale it. And when you’re talking national security, it’s the scale that matters. If you only have one of something, you can’t implement it. If you can’t implement your reasoning or autonomous AI in your fighters to adjust trim or your analysts to better outthink us, it’s gonna be very difficult to get a sustainable competitive advantage. That’s where that implement comes in, and that’s why these power matters because now you have a huge power and chip requirement both on the train side and on the implement side. That’s what’s gonna drive a lot of this cold war because you can’t just do it on the train and then serve it to thousands of people distributed anymore. You’re gonna need bigger and bigger data centers, more powerful power consumption data centers, more chips, and that’s where this contest comes into real focus.
I don’t see any text provided to clean up. Please share the transcript text you’d like me to help format and edit.
Jim: Yeah. That’s a good overview. And I will remind people, we’ve talked about this before, but not everybody spends all day listening to Jim Rutt Show podcast. I don’t know why, but better use of your time.
The idea of the Jevons paradox is very important. This is the idea that as the cost per unit for basic infrastructural components of an economy goes down, initially, you’d say they’ll probably be using more and more, but the total dollars spent will be less and less because it’s so much cheaper. Two examples are electricity and gasoline. Both of them have come down by factors of a few hundred since the 1880s, while the total dollars spent have gone up exponentially.
At first, that seeming paradox is easily resolved in many situations. As the infrastructural costs become lower, lots more things become feasible. For instance, when electricity was first deployed at scale, it was so expensive a typical working person could afford one light bulb hanging in their living room. The idea of using it for vacuum cleaners or, heaven help you if you tried to cook with it, would bankrupt you. But as the price came down, there were more and more applications for electricity. And as we know, electricity basically swept the world in about a hundred years and gasoline did the same. Gasoline was quite an exotic product in 1880. And now it’s cheaper than bottled water, which always amazes me. The guy that figured out putting tap water in a plastic bottle and selling it for more than gasoline – he’s the unknown hero of American marketing.
But anyway, to this point, Jevons paradox is probably driving the things that you’re describing, which was things that we weren’t even thinking about two years ago. ChatGPT, holy shit, ChatGPT. That’s like, this thing is the bomb. I started using it for programming. Started using it a little bit for writing – first use I used it for was to draft a resignation letter from a board of directors I was tired of being on. But next thing you know, we’re doing it to write movie screenplays. Next thing you know, we’re doing it to do simulations, and on and on and on it goes.
And we have no idea how far this goes. And as the prices have come down – and they have plummeted because like anything else, chip prices are coming down, you know, famously Moore’s Law, cutting in half every eighteen months. The GPU prices have been falling faster than that because they’re actually quite simple. A GPU is a remarkably simple device. The hard part, as you pointed out, is getting the high-speed memory-to-memory interconnects for the training problem, but that doesn’t really apply to the inference problem.
So this is a very interesting game. And at the same time, and I predicted this and my good friend and advisor, Peter Wang also predicted this, the algorithms have gotten a lot better on what you stuff into the model sizes. For instance, Google yesterday released a model Gemma. The reports coming back is it’s ridiculously good on tiny models. Even the 1 billion parameter models – the 1 billion parameter you can run on your phone easy. The 27 billion model, which you can easily run on a decent PC or Mac, is supposedly just shy of R1. And this is quite amazing.
And what they have done is come up with a new way to think about training the model instead of training it on raw data, kind of like DeepSeek did, but in a more powerful way. They distilled models they already had since they already have the big Gemini models. They could distill the essence of those models down into a much smaller framework. And again, capture 95% of the power of the big models in amazingly tiny models. And those things are going to become extremely popular for on-premises running, where people are doing the kinds of things you describe where you might run a hundred thousand inferences on one problem. You just basically have a bank of gaming PCs with GPUs in them and crank out the results for remarkably low price.
Timothy: And I have a stat here for your Jevons paradox because I knew it would come up. I’ve been working on a working paper – I’ve been doing this research for about a year and a half at university. I recently did some subcontracting work on a different article. So with that Jevons paradox, right, the scaling efficiency – people think about the cost per chip, the cost per electricity networking, that’s all dropping. But when we talk about a frontier model – so frontier model just means one of these big models at the edge of technology, right? What’s at the frontier? If you looked at frontier model costs, the cost of those frontier models have quintupled every year between 2020 and 2024.
So even as the cost per unit is going down, there’s so many more units, so many more GPUs, so much more energy running through these models to hit that frontier mark. And that’s where the cold war comes in because I don’t think the frontier models today – I wouldn’t trust them to national security. We don’t even have self-driving cars yet. But you’re talking about things that are maybe running sixth-generation fighters being used for national security. They’ve got to be much higher performing than they are now: reduced hallucinations, ability to deal with complexity, ability to deal with these topics that have no standard models.
It’s probably gonna be a frontier model several iterations out that begins to get in that space, but we can see where that potential is because you can kind of follow the trend lines and see where some of these problems might begin to resolve. But each one of those frontier models is gonna probably quintuple in cost requirements every year. We might see the first billion-dollar model this year, the first ten billion-dollar model 2027. There might be a hundred billion-dollar model, and that’s just on the training side. To implement it, like you said, you may have those banks of computers running those GPU clusters, but now one reasoning inference may take up a whole slew of those. So even as the scale per unit costs come down, the absolute requirements go up, and this creates this challenge of power and compute, especially when getting on the frontier stuff of national security that probably won’t be in effect for a couple years. But like I said, you can’t afford to be behind the curve, especially with China and the US given the conflict over Taiwan.
Jim: Now maybe interesting – I’m not 100% sure that frontier models are relevant to things like flying fighters or doing swarms of drones. In fact, we had on our podcast a while back, Sergei Kupryenko from Swarmer, a leading Ukrainian company developed and now deployed drone control software that uses deep learning. And what they have done is what you would predict a small Ukrainian company would do. They built a small, highly specialized deep learning model, did a shitload of simulation, fair bit of field testing, and then deployed it and then learned in quick loops. And so their model is actually quite small because it’s for a specific purpose. The frontier models where their breakthroughs are going to be are for like large conceptual problems.
Timothy: Generalizable problems. Exactly.
Jim: Accelerating scientific research, figuring out how to fuck over a society with memetic warfare, you know, things like that. That’s where frontier models are gonna go. I don’t believe they’re gonna be of the essence in weapon systems. Those are gonna be specialty models as they’ve always been. I mean, they didn’t put IBM mainframes in tanks, right? They put very specialized, ultra-expensive little mini but high-powered computers in tanks even back in the Cold War days. So I’m not buying that the frontier models are about competition in tactical situations.
Timothy: It’s an easy example. It’s probably not a fair one, but the real power of these is in the complex systems – the areas where there aren’t standard models now, social complexity, societal resistance to adversarial tampering. You talked about memetic warfare. This is a passion of mine going back to my original PhD topic, which was memetic evolution in competitive environments, but no one could understand what I was talking about. So I switched it to violence and instability.
These concepts in these complex systems, they’re a lot trickier than the physical systems, and one of the deep problems is that physics has symmetry and invariance. Gravity works the same over there as it does here until you get to the really extreme large or the really extreme small. You’re working with the same thing, but these complex systems, they don’t have symmetry. The way you talk about societal tampering, the way we perceive time, time delays, the way we process grievance – these are deeply culturally contexted. And even within a specific culture, it can vary a lot. And there’s some of these deep uncertainties with this sort of future horizon that’s gonna challenge the frontier and keep driving to get to that reasoning and autonomous agent.
Now the autonomous agent, I think, here is less about what controls the F-35 or something like that and more like an autonomous superhuman analyst that can plot how do I best use all my assets or how do I best use and plan across all the permutations and contingencies. If your audience is familiar with the OODA loop – I’m sure it’s come up on your podcast – getting inside the OODA loop at a strategic level of your opponent in a decisive way, that is where the real power of a lot of this comes in.
Jim: And if I were doing that kind of work today, I would still be using the LLMs that we have today, but as an adviser. For instance, if people haven’t tried it yet, try these deep research modes. Someone told me you got to be doing it, so I sprung for the fancy $200 a month ChatGPT, which was the first to have the deep research. And I probably now run 25 deep researchers. I would say 20 of them – the report itself was worth the $200. Just the one report. It’s fucking amazing.
And if I were trying to get inside the OODA loop of an adversary, I’d have a team of high-level thinkers thinking what are the big problems. I would then have junior people basically writing one-page briefs for these deep research engines. And then I’d have the deep research engines generate amazingly in-depth reports and then have the juniors interpret whether they’re interesting and pass the learnings up. You could essentially get a 10-to-1 at least gain in your analyst community that way today.
Timothy: And some of the research I did with dialectic simulations with the Turing Institute is looking at what you can use today. Some of the biggest uses is how can you cut down on the overhead you’re imposing on your analysts and free them up from strategic thinking and fancy stuff. Can it write an expense report? Can it do a status report? Can it do some of these things that we put on analysts that take their time away? And to a point you just made, can it challenge conventional thinking? Can it get that outside the box? We didn’t think about that. Maybe it’s not plausible, but it sparks that idea. These are things you can already do today.
If you think about it in three phases – Phase one is what benefits you can grab right now, and there’s some limitations to getting phase two about the data, especially when it comes to conflict and national security. Unlike a lot of areas AI’s been in, a lot of the data for conflict is very limited – it’s either classified or incredibly limited even in the classified environments.
But assuming you can get that data, then you can get into environment where, these days in IT centers, nobody’s physically inspecting the servers and looking for a problem. They get automated alerts that say, “Hey, something’s going wrong. Pay attention.” Well, imagine an AI system that can scan all your available data and prompt alerts: “Hey analysts, look into these contingencies, look into these conditions, look into these factors.” That’s gonna be something that directs the human analyst to pay attention without having to sort through mountains of stuff. That’s like a phase two capability.
Phase three is when we get to these standard models. If we can get to a standard model of social complexity, now you can use those standard models. Let’s go back to AI – how do these things often learn? Trial and error. The best learning almost always comes from trial and error, but it needs something to check against. With math, you have a math solver. With Go, you have a Go board. You can know the points. There’s some standard model that allows it to check, but there’s no standard model of conflict. There’s no standard model of social complexity.
But if we can develop that standard model using AI and modeling and simulation, now if we have a standard model, you can begin to create synthetic data. That’s where some of these things like weather forecasting really benefit. We have a really good standard model of how the weather works. So even if we don’t have all the data we might want, we can generate data from models and then use those models to fill in the gaps. Right now, we don’t have a standard model of conflict, so I can’t go back to 1870 siege of Paris and fill in the data gaps of what happened in the city under siege to add it to the training dataset.
You look at a lot of these conflict databases, they might have the country or the year or a few data points. They don’t have thousands of data points like financial data or hospital data or these other things that we can train on. So the difference in the phasing is how it’s gonna evolve in these capabilities. There’s things that can be done right now, there’s things that can be done in sort of a phase two that add power to it as we’re developing that standard model, and then people worry about the Skynet single model or the integrated platform. That’s gotta probably come after phase three. I don’t think we can get there without a standard model. And as you know in complex systems, it’s not altogether certain there can be a standard model. That’s gonna be the real challenging question – can we eventually make a standard model? If we can, we can probably get to some of these integrated Skynet-style super thinking platforms that are really superhuman capability in conflict and social complexity. But without a standard model, I think it’s gonna be very tough.
Jim: Well, I would put down the flag pretty strongly that in complex system space, there is no standard model. And I would appeal to the famous No Free Lunch theorem by David Wolpert, which basically says that there is no universal search algorithm. He proved it mathematically, and that you basically have to match your search algorithm or your modeling methodology to the problem at hand and take advantage of the regularities you know about your domain. So I would suggest that we know there’s not a standard model for social dynamics. The question would be, to what degree can we make easily tunable components and self-learning systems that essentially bootstrap from data to be good enough?
Timothy: Exactly. Few-shot models.
Jim: Yeah. And by the way, when you were saying this, an idea popped into my head. I hope this idea doesn’t destabilize the world. But to your point about becoming a military historian as a serious hobby – I know a lot about a lot of battles, read hundreds of books on very detailed crazy shit. There are all these books out there, right? Millions of them about every battle, like the little Battle of McDowell up the road, which was the first of Stonewall Jackson’s Valley campaign. A couple thousand guys on both sides. There’s our monograph, very detailed written about it. Why not just throw the text into even today’s LLM and tell it what you want to extract to feed your precise model of the Battle of McDowell or, you know, a bigger battle, far better documented, the Siege of Paris in 1870? And you could produce thousands times as much data as the analysts have put into those little databases. We have one of the top conflict modelers or data people in the world at Santa Fe Institute, Libby Wood. And the amount of data she has for conflict, it’s like 50 data points or something. But if you were to just crunch the books and say, pull out all the data, here’s a rough ontology of possible data points, or even better – here’s a thousand books, pull up and invent your own standard ontology that has a 50% hit rate across all these books, suck the data out of the books and load it up in JSON. Yeah, there’d be a few errors in it, but it’d be enough to build a really good model really quickly.
Timothy: That’s where we might disagree, because I think a lot of these history books, data science and military history – and you can tell this as you’re reading if you’re a fan of reading – it really only comes about after World War Two. And then some people go back retrospectively and add the detail. A lot of what we know of warfare, if you’re looking for 50,000 fields of entry to get in your thing, you’re gonna rapidly run out before you start getting back to histories that are effectively narratives and myths. Right? The Battle of Tours, the Hammer – yeah, we have myths and narratives on those. Even in World War One, the data, which is probably one of the best documented before World War Two, is very inconsistently sampled compared to what might be in other ones. That monograph may be focusing on one area.
Now it’s possible that you throw all these books in and you exclude the rest of the net because all these books, I imagine, are already in the large language models, but because they’re with everything else, they get exactly what I was saying. And that’s probably the key. We’ve been talking so far about generative AI, or at least I have – large generalizable models that can do all sorts of things, and I think there’s gonna be an evolution in trade-off. You talked about other kinds where the trade-off might be a more specialized fit-for-purpose model with a smaller dataset trained more specifically, and you’re doing the trade-off. Like any software application, you don’t want one piece of software to do everything because it’s gonna rapidly become the big ball of mud. So you basically trade off and say, look – this is more fit for purpose, this is more specialized, therefore less hallucination, higher accuracy, higher fidelity. That trade-off between generalizability and accuracy or precision in context.
But I have a feeling that – and this would be a great test if someone wants to run this, I can’t fund it – but throw all those history books in there, I think we’d still struggle to use that to predict conflict or understand how conflict will perform like we could with something that is incredibly instrumented, like hospital beds or something like that, where we have really good, rich data, thousands of fields per hospital, thousands of hospitals tracking. This came up in some of the research we’re doing at the Turing Institute – we’ve only had really good ISR data for ten, fifteen years maybe.
Jim: What’s ISR?
Timothy: All the satellite and sensor data that the military is going on. Even if you could get a hold of that, we don’t have a lot of longitudinal data, and all of it’s gonna be biased to the conflict. So if you took the last 20 years of satellite data, it would give you really good insights into how to run an insurgency conflict perhaps, but it might not have as much data on conventional conflict – only since 2022. But if you went back sixty years or seventy years, you’d include the Korean War, that’s the longitudinal problem. A lot of the data sources that are our richest data sources – emissions, signals, things that you can’t fudge or factor – are only coming about in the last couple years, and we just don’t have the longitudinal to capture a history of variety that gives you all the potential manifestations. I remember people thinking they were shocked when Putin invaded because there wasn’t supposed to be another conventional land war in Europe, and yet…
Jim: There we are. Yeah.
Timothy: Yeah. And that’s the danger of national security. It’s a dynamic here, but it’s extremely high stakes, rare probability, but you can’t get it wrong when it happens because the consequences are enormous, much unlike a hospital bed problem or something like that.
Jim: Yeah. Well, you always get it wrong. That’s the thing, of course. But you have to adapt faster than the other guy.
Timothy: Right? That OODA loop. Yeah. The adaptation cycle.
Jim: Yeah. The United States in 1939, our army was smaller than that of Romania. I think we were the nineteenth or twentieth smallest army in the world. And six years later, we had 19,000,000 people under arms and had been the leader along with the Russians, who actually did more of the work than we did, in crushing some of the worst tyrannical militaries in the history of the world. Getting it right all the time is, I think, a fool’s errand, but having faster OODA loops and being able to learn faster is the real win, especially in a long-term strategic war.
I think I surprised the shit out of everybody that the Ukraine war has lasted three years and shows no signs of ending except by arm twisting and bribery. Most of the wars that the US has fought were four days, you know – Israel, Six Days, Ten Days. It may well be that we need to be thinking back to the World War I, World War II scale of endurance and the industrial capacity necessary to do it.
The other interesting thing is, optimizing the industrial leaps that would be necessary could also be a giant strategic advantage. Frankly, one that scares me is that the Chinese are likely to be a whole lot better at than we are. Suppose we do end up in a World War I or World War II style conflict with China that lasts for five years and is based on tonnage of artillery shells and number of smart missiles created and drones, etcetera. The Chinese are way better at that kind of stuff than we are.
Timothy: And that’s probably when it comes to national security, there’s so many potential uses of this generative AI that are non-obvious in national security. But let’s say they never do anything in an obvious military platform sort of way, but they simply use their smarter AI to plan that industrial base or advance those economic leaps that will open up new opportunities for GDP, economic growth, higher technology, faster optimization of the supply lines.
These things become very important when we look at the Taiwan conflict. When we think about who has the natural advantage of AI production, the US hands down has the chip production because of its relationship with Taiwan. It’s got all the high-tech firms. It’s got the lead on chips. But chips are only one part of that, and the other part is power.
If you look at China, China’s already got five-gigawatt aluminum mills dotting the countryside. Adding power is not a problem for them. They are able to do that quite easily. But in the United States, it’s not always obvious how you’re going to add power quickly. I’m looking at a chart here that takes a forecast from an article that says the US is forecast to add 30 gigawatts of net new power by 2030. The global AI forecast of power needs is 130 gigawatts. Now, obviously, global is global, not all gonna be in the US, but you think about a hundred-gigawatt delta gap. Even if every ounce of new power in the US went to these AI data centers, there’s a hundred gigawatts looking for a home to reside in.
Where’s that gonna go? That’s where the edge that we have on chips may fall to China with their ability to add power. The lengths to which these AI companies are going to get power are sometimes insane to the point of almost comedy. It’s too long to build power. You’ve gotta transmit it. If it’s not already in place, you’re gonna miss that frontier model cycle and potentially fall behind. So they’re just scrambling to get any power they can. They’re recommissioning nuclear facilities, they’re using natural gas facilities, just pumping it in the data center, and that can only go on for so long.
So it’s gonna be this interesting tension between China’s ability to manage the chip production. Like you said, there’s a lot of questionable shipments coming from Singapore and South China across airlines. How are those smuggled chips getting in? But it’s gonna also be the US companies’ access to power. And what does that mean for the overall economy for the rest of us? Power has been fairly cheap. Power will probably never be the most expensive part of AI, but you need to have it. So even if it’s not the most expensive part compared to the chips, do you have enough to run these systems?
Jim: When I hear about things like we’re gonna have to transition to a 2,000-watt society or something – wait a minute, guys. You’re lacking innovative thinking. A hundred gigawatts is surprisingly not that big a deal at current pricing. Assuming one-third usage, you could get a hundred gigawatts of power for $300 billion. So a small percentage, 30% of the defense budget for one year spread over two years, could easily afford it. And so brute force, there’s a brute force way to generate that much power in the United States alone. Just put all the data centers in Arizona, New Mexico and Nevada and big fields of solar all around them. Basically, whether you need batteries or not, I don’t know. Think you just run them very bad.
Timothy: I think you need batteries to keep these things running. You got so much GPU power that you have those fluctuations, you’re going to burn out the data center.
Jim: You need short-term batteries. The question is do you run them twenty-four hours a day or just run them on the sunshine?
Timothy: Yeah.
Jim: And it all depends on the economics. So it’s doable. The question is do we have the will? And this is the case in so many things. I mean, the US could still – it’s so embarrassing how slowly our ramping up of 155-millimeter artillery shells has been for the Ukraine conflict. You know, the 155-millimeter artillery shell is a 1910 technology, right? We should be able to produce unlimited numbers of those from a standing start within a year, but we’re still struggling to supply the Ukrainians with enough because we lack the will, the ability to execute in our society today.
Timothy: I mean, it’s fair. Whether you call it will or regulation, adding that power plant capacity or buying that capacity, it’s a trade-off of what is the Defense Department not gonna fund. Is it gonna add that hundred billion to the top of an already existing budget, or is it gonna not get its next destroyer, submarine, carrier, whatever it is? And these are the trade-off decisions that are like, AI is not worth a super carrier right now. I’m sorry. I’m a little biased to the existing systems. Right? I’d prefer to have another super carrier group out there.
But at some point, what is the strategic advantage of AI? And of course, in National Security, you gotta blend all these things together. You can’t just pick one and maximize that and hope you picked right. So that’s really – but I think part of this when it comes to the power, one of the things that might be innovative, you talked about innovative ways to do this, is where models and simulations that have kinda died off or gotten less attention perhaps in this AI moment – models and simulations are incredibly power efficient compared to the AI, and they do very well at preventing that hypothetical standard model of the world.
If you think about it, right, people say, well, the AI learned on Go and there were no human priors because it was self-play. Well, there was a human prior. It was the game of Go. Right? There’s a board, there’s rules, there’s certain ways you score points. That’s the standard model of Go. Models and simulations, whether you’re doing them system dynamics, agent-based, or discrete event, they can create these standard models of the world. To your question, can we make a model that is good enough that now is the solver, the objective checker? The AI is doing things, and it’s testing it against the simulation to see, is this correct? Is this out? And it can learn from that.
There’s a whole lot of ways that might be innovative to combine modeling and simulation and AI. Now, of course, it’s a little trickier because model and simulation, you can’t just do the scaling hypothesis. You can’t just add chips and data and then magic happens. You gotta build the things, but there may be some interplay, and a lot of what we’re talking about in the article is what are the problem topics we’re trying to solve, and what ones are better solved by modeling and simulation that might be integrated into an AI solution versus just trying to brute force with scaling hypothesis? Because I think you’re right. It is a will question, and we might get there, but it might be a whole heck of a lot easier if it’s less of a will question or more of an innovation question.
Jim: Yeah. Clearly, we certainly want just like getting to carbon neutral. We can get to carbon neutral. The question is, do we do it by brute force that costs, you know, 5% of the GDP or do we do it taking advantage of the innovation curves and maybe do it for 2% of the GDP? Various things to respond here. One, of course, is that models are absolutely essential for one of the biggest classes of AI, which is only somewhat related to the deep learning model. And that’s the self-driving car problem, right? Those companies have between 10,000 to 1 simulation miles driven by their engines as opposed to real miles. And so we would not be even close to where we are today with self-driving cars without huge amounts of model in the loop.
Timothy: And you’ll see in the paper that we even mentioned autonomous tasking agents using a simulation model. It may be a novel task, like go out and procure a supply chain’s worth of equipment for me, but it’s using some model to test different procurement strategies before it starts placing orders. So it can kind of test just like we would in the real world, but it can test a thousand variations where we might only, you know, pilot a few. So definitely, this using autonomous agent tasking as a model is a big part of what the opportunities are.
Jim: And I’ll say the other thing is, two companies I was involved with – I was chairman of one, I was on the board of directors of the other – both used models in the loop with AI back in the double aughts to design analog computer chips. Turns out there’s a standard model for analog computing, was it called SPICE? I think it was. And we built these vast data centers and ran hundreds, thousands of copies of SPICE in parallel and then used those as input into genetic programming models to attempt to optimize and design and do other analysis on analog chipsets. And it worked. We built both those companies successfully and sold them to bigger players. Ever since I’ve been looking for other businesses where simulators in the loop combined with AI give you a big advantage.
Timothy: So let me give you another one, and maybe this is worth putting down and funding sometime. What – and this is the paper that NATSEC – we call it NATSEC MAISC. Modeling and simulation, artificial intelligence on systemic complexity – MAISC. Right? M-A-I-S-C. Think about that context problem of existing AI, the scaling context problem. As you get larger context, you’re doing more reasoning. It’s having to hold all this memory. It’s very inefficient. Perhaps the model and simulation can be the context repository. The AI itself doesn’t need to hold all that context in these very valuable, very high-powered chips that would be better used for training or implementing inferences, but perhaps a model and simulation in the loop can hold that context and then be queried in an energy efficient way so that it’s not having to do all these permutations just on the most expensive resource. Now, again, this is a hypothetical idea, but that could address that context problem.
Jim: That’s a very interesting idea. I’m gonna burn some cycles on that. Think about where you use a model that has what we call terminals or functions in it that interact with the LLMs, and they’re working cooperatively and sending data in both directions. And, the models, as you know, they can – this is an important point, subtle point, but – deep learning models are operating at an extremely fine-grained level, teeny, teeny little switches that are basically yes/no switches. The big models, a trillion of them approximately. In simulations, way higher course graining, you know, that may be dealing with a model that’s dealing with a million items is a big fucking model.
Timothy: It’s huge. I can give your listeners an example that’s very relevant today. I have a simulation model currently that we originally created for ISIS in Syria and Iraq. We’ve actually used it to simulate what a hypothetical ceasefire between Russia and Ukraine would look like for post-conflict stabilization. So post-conflict stabilization is you gotta model the country. You gotta model where the line of control is. You gotta model all the ethnicity, stuff like that. That model, you talk about coarse graining, it’s about 3,000 equations, so not nearly the 1,000,000 equations. It’s got some subscripts and things so we can use the same structure for different ethnicities of population, but it is nowhere near as fine-grained as an AI model. But it can hold a tremendous amount of conflict context about ethnic sentiments, changes in sentiments, things like that that might offload some of that, and I can run that on my laptop. This is the trade-off. Right? I can run that on my laptop in a couple seconds. It doesn’t take anything significant. So I think that coarse graining point, can you hold context in something that is more coarse-grained but permutable to get the iterations you need and then query it as necessary from the AI, which is doing more of the reasoning? It’s always been interesting to me – Go game is eventually a model of the world. Well, if you have a simulation model that has inputs where they’re now not doing stones on a board but giving commands and stuff like that, then the question is how good fidelity is your simulation model? What’s its similarity to the real world? And then your AI is basically acting like that super analyst to say, can we come up with novel things? I think humans in the loop always need to be there for sort of decision making and governance, but you can generate a lot of ideas. Like you said, you got your 25 researchers, and if one of them gives a good paper, you’ve paid for itself. That’s the concept here.
Jim: Yeah. And this is actually quite interesting because one of the things these LLMs are pretty good at is dimensional reduction. Right? They are super high dimensional. They have everything that’s ever been on the Internet in there and you could see, the standard vector is 762 numbers. So it’s a 762-dimensional space. And that’s just arbitrary insane. And in reality, it’s higher than that, but your models to be tractable, have to be much lower dimensional than that. And so think about this, suppose instead of using the LLMs for reasoning also, but mostly you use it for dimensional reduction. Right? And then I think of the impedance matching between the LLM and the model at a specified level of coarse graining. And so you’re firing queries into the LLM and asking for things back at the appropriate level of rough coarse graining to fill in a spot in this 3,000 or million parameter model. And that could be powerful.
Timothy: Oh, there are times with the model and simulation where there’s no published parameter. This gets into synthetic data creation. Creating synthetic data – we didn’t know, for example, how long it took ISIS to train fighters. Well, we knew about how many were coming in, we knew about when they showed up on the battlefield. It’s an easy problem with a model to kind of reverse engineer how long it takes to train them in a range. You can do some of these fill-in-the-gaps where there is no data in conflict or things like that – synthetic data. But again, you need some, as you know with model and simulation, the validation, the verification, the confidence building – that’s a heavy investment to make sure you have really good… this is why some of those models we think are more Phase 2 because we don’t even know if we have enough theories existing that could be represented in models that we might have the right ones within them to get us the right model and simulation. Like, there’s a lot of work that needs to be done if we’re gonna get to that suite of models. Not one model represents a complex system, but a suite of models to represent types of systems. We still don’t even know if we have the theories that are dominant in the twentieth and twenty-first century. Those may not be the right theories. So we gotta – there’s a lot of work that needs to be done on the model and simulation side, but I think it’s a promising opportunity to integrate these because of this power compute problem.
Jim: Yeah, it’s interesting. And I was just thinking about the classes of problems. One of the big problems in the strategy space is the very sparse dataset. You know, if you’re predicting the weather, you get a shot every hour, right? And you can score your model, how to do seven days, how to do thirty days, and self-driving cars the same. Okay, how did my prediction do for three seconds? How did it do for five seconds? Wars – they don’t happen very often. So we have extraordinarily sparse data, which is not actually a good fit for LLMs as it turns out.
Timothy: Kind of what I was talking about earlier. When we talked about the battlefield books and things like that, the sparsity of data – I mean, you could say hypothetically, let’s say you need 50,000 things. Do we even have 50,000 wars we can study? Do we have 50,000 that is more than just the name, the country, and the year? Probably not. Do we have any pieces of data that you’re talking about the dimensionality? Right?
Jim: Probably not.
Timothy: There’s not enough data. The other thing, though, with weather, you have that standard model of weather. So when your AI makes a prediction, you can run your standard model weather and say, is that prediction probable? It now gives it a way to talk back and forth and learn through its own trial and error, just like a math solver. Right? You got a fancy calculator here and your AI is trying to learn the math. Well, it can check and get a very objective right or wrong, and that’s a great way. You don’t need a lot of feedback. You just need it’s right or it’s wrong, your reward weights and all that, and it just goes off from there.
Jim: All you need is a gradient. Exactly.
Jim: You could have the sleaziest shitty model, but as long as you have a gradient up with modern technologies, it can find its way. This is very interesting. I’m actually, this is making me think my head must be hurting. You know, for doing military strategy, this makes models indispensable, right? If you’re trying to do high-level strategic stuff because the opportunities to have data are so infrequent and they’re also so heterogeneous to your point earlier about, okay, yeah, we got probably got pretty good metering on counterinsurgency or, you know, a level against C level, you know, okay. We kicked Iraq’s ass twice, BFD.
Timothy: Well, even with counterinsurgency, we only got it for two countries. Like, what if you’re fighting in South America? Okay, maybe the FARC, you can do a couple polls. What if you’re trying to fight? I mean, that’s the thing where it gets really tricky. There’s lots of insurgencies in Africa. Do you have the data on them? How good’s your data on the Sahel? And this is why we call it NAATSEC MASK. I go back to that acronym: Modeling and Simulation and Artificial Intelligence on Topics of Systemic Complexity for National Security. We think it’s gonna be a blending that gets to that eventual solution set that you need to have. And you’re always gonna have humans involved in it in some way because humans are exceptionally efficient at learning. We talk about how difficult it’s been to get the self-driving car. I can take a high schooler and give them twenty hours, sixteen hours of education – that’s the common example I think Yann LeCun gives. You can teach them in twenty hours to do reasonably well. Now, obviously, with self-driving, there’s higher levels of standard and rigor, but the concept is the same. We also have tremendous dark knowledge – the knowledge we learn, the mental model data we have in us, that’s not published anywhere. That’s not out on the Internet. There’s not even a good theory of how to quantify that and access it, let alone make it accessible to learn from it. So how are you gonna capture this mental model data of how people think and react and put that in a training dataset that might not always be AI? It might be easier to have a model and simulation of how an individual or group thinks or acts. And, again, it’s validated, verified, high confidence model, but then integrating them together.
Jim: We’ll come back to the issue of validation because that’s been a problem even since the simplest agent-based models in social science. But before that, I have to dunk on you once.
Timothy: Feel free.
Jim: Where you said, I’d rather have a carrier battle group. I go, talk about a useless motherfucker. I would rather not have the battle group and not have the money than have the battle group, because that’s just metal on the bottom of the water, probably.
Timothy: Maybe. I mean, it played a decisive role in the recent Iran – it wasn’t really Israel-Hamas at that point, it was Iran. Yeah, it’s good for-
Jim: It’s good for that skirmishy shit, but not against a peer or near-peer power. Those things are all going to the bottom.
Timothy: I wanna- I think that’s fair. Absolutely fair. I have gone on the record before that I think drones are more valuable than main battle tanks, so I’d be, I think, disingenuous. But I think the current state of AI, I’d rather have a battle group. But I think that’s fair. You could say the money invested in something might be a fair trade-off. I’ll take that dunk. I’m always good to get dunked when I need to be.
Jim: I always like to remind people that all that money spent on battleships in the thirties, they all went to the bottom whenever they confronted any aircraft carriers.
Timothy: Yeah, exactly. And that’s the thing – I mean, I don’t mean to go back to Ukraine and Russia, but that is a drone war now. I mean, there are other elements in play, but they started producing… when we talked, they were probably producing a few thousand drones a month. They’re producing millions now.
Jim: When I talked to Sergei, at that point, which was in July 2024, little Ukraine was producing a million drones a year.
Timothy: That is a style of warfare that we have no idea about. I mean, again, not to get deep into it, but man, like you said, managing that and getting all the sensor data, consuming it – that’s where I think a lot of the AI and they’ve obviously got companies working on it. But how does that go up to upper command? How does that give them a common operating picture across the battlefront? What else can you be using for that? That is where combinations of drone swarms and MASINT-type capabilities, not the whole picture, but gonna be very, very different than how we used to fight wars in the twentieth century that we’re familiar with.
Jim: Yeah, that’s gonna be very interesting. So let’s go on. We just started to, before I took a little divergence there, which is the issue of validation of models. I’ve been fooling with social science agent-based modeling since 2001. Right? And you could get interesting results. But how do you know they have any validity? And it’s actually a very difficult problem. In your case, I mostly did stuff on financial market simulators – at least there we have a fair bit of data, but in the strategic sense, have very little ability to actually score your model against anything objective.
Timothy: Well, especially in a novel case. So let me take a step back to something I work on all the time. I come from the system dynamics tradition. For those who don’t know what Jim and I are talking about, there’s three legs of the stool’s modeling. There’s agent-based modeling, which is modeling individuals with rules and emerging behavior up. There’s system dynamics where you put in system structure to understand the behavior. And then there’s discrete event simulation, which we tend to include, but I’m not really sure what it’s used for on the strategic sense – don’t mean to dunk on them, but those are the three legs.
I come from system dynamics. We talk about confidence building. We’ve even given up on the term validation. We’re saying you can’t validate because we’re modeling complex systems. And to your point, there is no standard model of a complex system. You can’t test it empirically. You can’t go back and run the same thing.
Taking a step back, I work with public mass killings that are individuals radicalized to do school shootings. So not the same as a strategic thing. It’s happened hundreds of times. We have some data. But even there, we try and test our models to say, with starting conditions of the Columbine terror contagion, can we create the replication where these other high school shootings in the Columbine style played out? We’re looking at confidence to say, can we recreate history?
But even with that – that’s one level of confidence. Can you recreate history? It’s fairly basic, but you can’t stop there because history is full of contingencies, and this is the classic game of historians. What if something else had happened? Your model has to be able to represent these counterfactuals, these historical counterfactuals that says, for example, in Columbine, instead of going in and shooting the cafeteria, they wanted to blow it up with an IED. The IED failed to go off. So in our model, we tested what would have happened had that IED gone off.
Now you’re making assumptions and you’re doing tests. So part of the confidence building is does the counterfactual result bear scrutiny to known experts? Going forward, in the past, these experts are human experts. Going forward, these might be AI experts. So that’s another level of confidence building.
A third level is future counterfactual, not historical, but future. So we did a – the same model we use for Ukraine-Russia – we did for the Myanmar civil war. We’re trying to forecast what you would need for humanitarian operation in 2029, and then you have to assume the civil war either continues or at 2026 arbitrary date, they reconcile the ethnic majority, the Bamar that are the pro-junta and anti-junta. The future counterfactual is they reconcile.
So now you’re doing a future counterfactual of something that not only has happened and had some choices, but maybe totally different. And these confidence layers are all just building confidence. You can never say for sure because you’re never gonna be able to run the actual Myanmar civil war twice. You’re not gonna be able to do it. So you’re always speculating at some point in the future.
And when it comes now up to these strategic things, we simply don’t have the data. And I think this gets back to why some people said the best these models can do is challenge conventional thinking. Give you a second pair of eyes, so to speak, or a million second pair of eyes that are all churning away, and one of them says, here’s a really outlandish scenario, and someone goes, “but that’s plausible.”
What does it take? Now this is where the value comes in, not prediction, but forecast. Remember I won the game by giving five forecasts in Ukraine. Right? I blanketed the field. I did the thing. Well, I could have done a million forecasts. You would have looked at me like, what are you doing? I picked five, and they covered a range of permutations. And if you look back at that, each one of them had certain contingencies I called out. Does Russia’s logistics improve or get worse? Fairly simple ones.
But that’s how you now begin doing. Rather than trying to predict, you do forecast and what those forecasts should give you. And this is where reasoning models matter more than ChatGPT. You put in ChatGPT “what’s the outcome of the Ukraine-Russian war?” It’ll give you an answer, and you have no idea how it got there, and you know none of the intermediate steps. A good model and simulation or a good reasoning AI can say, “tell me the progress of Ukraine-Russian war from this point and how it will end.” And you care less about the end, but the progress, because the progress you can check along the way.
Now you’ve got mile markers for confidence. It’s almost like you have a simulation of the world going on one side and the actual world, and you’re testing the divergence and gaps between them to say, which forecast path are we most on right now? And that begins weaning out other forecasts. It’s excluding them, dropping them off. Things happen. And it builds confidence that of the forecast you have left, you’re still progressing. Big caveat – you can never predict the future, in my opinion, in a complex system. It’s kind of a fool’s errand to try, especially at the strategic level, but it gives you that insight and perhaps that advantage and that asymmetrical benefit over your competitor.
Jim: Yeah. Let me give you the Rutt perspective on predicting the future. As Yogi Berra says, it’s fucking hard. Making predictions is hard, particularly about the future. And of course, everybody says, “Oh, you’re a complex systems guy. Let’s predict the future.” I go, well, the thing I know, first of all, about being a complex systems guy, you can’t predict the future. But what you can get if your models are decent – and even if they’re not, you can at least get a rough sketch – is what is the statistical distribution of trajectories. Right? You could say here, we set up a meta model and a data analysis and say, alright, 1% of the time, it could be way the fuck out here, 1% of the time way out here, but here’s where the center of the projections are, knowing, of course, that even that center is somewhat bullshitty. But these ensembles of trajectories are way better than picking a trajectory and saying that’s what’s gonna happen.
Timothy: I use the examples. So we’ve talked about strategy. Like, let’s say there’s a strategic conflict of Taiwan. It’s gonna happen once. This is kinda like a presidential election when FiveThirtyEight comes out and says, here’s our projection. Out of a hundred models or a thousand models, this person wins 53 out of a hundred, and this person wins 47 out of a hundred. And I’m like, yeah. But we only run the election once. Right? So as long as it’s within one of those hundred, you’ve hit your mark.
And that’s the challenge of these. I mean, the statistical distributions are good for likelihood and probability, but it could be any one of those distributions. And that’s where it gets tricky. As you said, hospital beds, financial markets, these things are happening over and over every day. Weather, to your point. Right? You get a much better sense. But strategic events, the invasion of Taiwan, if it happens, God forbid it doesn’t – but if it does happen, it’s gonna be a one-off. Right? It’s not gonna be the second or third. We can’t get the sci-fi alternate world and go see what happens if it was done slightly differently.
We only get one shot, and we really only get a couple – you talk about timing windows, and this is a whole different discussion. What’s your timing window in strategy? I did an exercise of the various timing windows. If you’re talking nuclear, your timing window is the shortest of all of them. You have fifteen to thirty minutes, and then if you lose on nuclear, the game’s over. Satellites, you have a couple hours. But if you lose satellite, especially the U.S. position, game’s over. Cybersecurity, you may have hours, but you lose your cyber infrastructure, game over. But conventional invasions of countries in Africa, you have weeks, and if you lose it, you can recover it.
These are the timing windows and severity of decision making that you can afford to lose. The invasion of Taiwan’s fortunately not as severe as a nuclear war. You don’t have to decide in thirty minutes, and then if you’ve lost, you’ve lost forever. But it’s a very short window of time that could have very dramatic consequences in how that war plays out. So having that, I think another thing, a variation of your range of predictions or your range of forecast is in planning layers of defense. Right? These defensive measures are good against these 57% of the scenarios, and let’s ratio. So let’s spend 60% of our budget on 60% of our scenarios and 30% of it – it builds in that layering so you’re not all focused on one potential threat vector. Because let’s face it, if the French had done this, they would’ve had a little more forces in the forest where the Germans came through.
Jim: Goddamn right. And because, you know, this is the basically cliche of military history – that the army always fights the last war. Right? And they think that World War Two is gonna be like World War One, wrong, et cetera.
Timothy: If I can just jump on that, fighting the last war, that’s why you need models and simulations. Because AI, by its very nature of training, until we come up with new training, it’s gonna be anchored to what’s already in history and biased to English, recent, and what gets talked about on Reddit. So obviously, if you can get a smaller model, you can get better fidelity. We talked about that. But AIs are inherently biased to things that have already happened, and that’s a real danger.
Jim: That’s a very good point. I’m gonna nerd out just slightly here. We were talking about frequency distributions, and that’s called the frequentist view of probability. There’s also the Bayesian view of probability. Unfortunately, unless I’m missing something, the sparsity of data impacts the Bayesian view, not quite as much, but probably a lot in this scenario.
Timothy: I don’t think statistics – so I come again from a system dynamics perspective. I tend not to look at statistics for my answer. I look at calculus. And so for me, the question is not how do I do a distribution probability or a Bayesian prior or things like that. I’m like, give me the starting conditions and let me generate the scenarios. Let me generate and do it in a way that we call operational causality. Right? Statistics are very useful, but they often lack, unless you put in the effort, the intermediate causality of how things get from A to B. And when I use calculus and system dynamics, the outcome is great, but I wanna see those intermediary steps. And that sort of when we say operational causality, it’s not a formal scientific causality which people can debate. It’s the how does the world work. Right? If a missile fires and then a very similar type missile lands 200 miles away in the same time, I’m gonna say it’s the same missile. I’m gonna say there’s an operational causality.
Jim: Yeah. You don’t wanna get into an argument with a philosopher about-
Timothy: Literally, I had a draft article that the first line is “On the basis of national security, we assume the premise causality exists.” Just to get that out of the way.
Jim: I love that.
Timothy: In national security, we’re just gonna assume the premise causality exists. All you folks with the arrow of time and all that, you can have that debate, but I’m not getting into it.
Jim: That was wise. That was wise. So, let’s exit on – put on your system dynamics hat, get out your differential equations, and give us a framing for how you might think about modeling the implications of AI with respect to national security. What are the major dimensions and what are the major knobs on your models?
Timothy: Well, I think definitely we’ve talked about two. And when I do system dynamics modeling with differential equations, I’m really bad at calculus, so I tend to think in trajectories. And I think of gaps and of trajectories and the velocity between the gaps. So imagine one line trajectory, which is the raw compute power we’re able to produce in chips, and the other is the trajectory of total power needed for both training and implement. Right? If we can keep the chip line significantly above the power line, the US has the advantage. As long as TSMC is still in Taiwan and intact, that’s our advantage. But if that power line gets up above the compute line, now the advantage goes to China where they’ve got enough chips of a sufficient power. Maybe they’re not the absolute best of the best, but they can put massive amounts of power at it.
And then these two trajectories, that relative difference in gap, then feeds a second order effect, which is the relative advantage of AI. And the relative advantage of AI, I’m talking here in a national security sense to the specific strategic question, and I’m gonna bound it very narrowly: Who wins over Taiwan? And what we have to do, and this is why the Cold War mental model exists that went into this article, we have to remain close enough to deter China. That doesn’t mean parity. Right? You think about conflict – there’s rarely anyone who’s equal. In fact, almost always, it’s a three to one or a two to one or some unfair. The question is, are you close enough to deter?
And this deterrence has a function of allowing other partners in the region to sort of arm up and step up. It’s happening in South Korea, Japan, Australia. There’s already motions underway. But if that second order or third order trajectory line of the relative power comparison between China and the US where China’s not willing to take the risk, and let’s throw on a third trajectory, some outlier effect, potential of collaboration. You brought it up before. Right? The way out of the arms race is Nixon goes to China. You somehow deescalate this. I’m not a diplomat. It’s not obvious to me how you’re gonna do that in Taiwan, but let’s throw that in there.
So there’s ways off this. But as long as we’re in this arms race, what matters is that relative comparison of power and that we’re either well above it, which I think we are now, or we’re sufficiently close to it that China is deterred from taking an action. As soon as they perceive, and it’s gonna be heuristic. Right? That gap or maybe it’s gonna be AI. Who knows? But as soon as they perceive that gap is closed enough where they feel they can take the risk, that’s when the Red Queen race ends, all the chips go on the table, and you see who is better. And, unfortunately, once that starts, there’s a certain amount of consequences and suffering you can’t get away from. Once Putin crosses into Ukraine with 75,000 troops and thousands of armored vehicles, there’s a certain level of guaranteed suffering you cannot get away from. And that’s the real risk, I think, in the long term.
Jim: I’m gonna add one more term. I don’t know how you model it, but I think that per our conversation today, it actually seems indispensable, which is you also need the innovation line. Right?
Timothy: Yeah. That’s implicit in the chip, but you could do an innovation line and tie it to power and chips and you’d have a feedback at certain AI and feedback into the innovation. Yeah.
Jim: Go ahead. I would say innovation in AI models, right? So for instance, the R1 is way less consumptive, right? So that’s an orthogonal dimension. You have chips, you have power, and you also have how efficient your fundamental AI models are and they’re getting more efficient, a lot more efficient. I’m really interested in playing with this Gemma-3 thing. This sounds like it just blows the bottom out of LLaMA and R1 and everybody else in terms of bang for the buck. It’s kinda crazy.
Timothy: It’d be nice to see Google back in the game.
Jim: Yeah. And of course they’re putting it on open source because they’re out to fuck the other guys over. Open weights?
Timothy: Even open weights, you can host them.
Jim: I don’t know if the weights are gonna be open or not. I think so.
Timothy: I think so.
Jim: Think so. But don’t take that one to the bank people, but by the time this thing comes out, we’ll know. Just ask Perplexity. Perplexity is my go-to tool for that stuff. And then I just got to put out being an amateur armchair strategist. My Taiwan strategy is very simple. Analyzing the situation – you don’t need those stinking computers. Just ask Jim, which is hedgehog strategy. Put 75% of your money in smart anti-ship and short-range ground air and the surveillance necessary to make sure that you’re ready should they come. It’s really hard to come across 80 miles of open water with thousands and tens of thousands of smart missiles diving on your ships constantly. And should you try it with parachutes, that’s even easier to stop with short-range ground-to-air missiles. So I think that you don’t need no stinking computers. You don’t need differential equations.
Timothy: Can I give you the asymmetric counter to that? Because my specialty is asymmetries. Right? I think you’re absolutely right. Very hard to go across 70 kilometers of water. So what China’s best move is is not to defeat our forces stationed in Taiwan or the hedgehog defenses. It’s to influence, let’s just say, hypothetical nation state legislatures, not to put the forces there, not to come in and back it up. That is the way you defeat the conventional – they’re not trying to beat us on the battlefield. They’re trying to beat us in the legislature, the budget, the procurement, the focus, whatever.
Jim: Public opinion.
Timothy: I think it’s a great point, though. Yeah. Exactly.
Jim: Yeah. Then of course, there’s the third one, which is, the old boy without his differential equations ain’t smart enough to dish. But if the Chinese were to try to do an economic blockade to Taiwan, you know, what would happen? Though again, I take off my country boy hat and put a little bit of modeling on it. They have a serious problem called the Straits of Malacca.
Timothy: Exactly. We shut that down much easier than they shut Taiwan down. So there you go.
Jim: And they’re totally fucked. 72% of their oil is imported from overseas, most of it through the Straits of Malacca.
Timothy: Maybe there’s a scenario where this is how it plays out. They try and do an economic blockade. We go to the Strait of Malacca, and all of a sudden, we’re back at the hedgehog.
Jim: Yeah. Yeah. It’s interesting. Hopefully, I mean, Xi is not a stupid man. He is obviously a very smart man. And to your point, he has to get to some level of confidence, 95%. And if we don’t act like complete asshole idiots, we should be able to keep him below that threshold. Unfortunately, with the current clown in the White House, I mean, he could fuck it up just because he is such an asshole. But if we had anyone with any sense at all, I think we can keep the success threshold below 90%, which will result in a long-term kind of frozen situation.
Timothy: And that opens the door for other collaboration, diplomacy, other partners stepping up. That’s the thing – you hold the line long enough to keep the door open.
Jim: Yeah, that was the famous “Long Telegram,” right? Which basically said just freeze the Soviet Union and its internal contradictions will eventually cause it to collapse. And he was right – George Kennan.
Alrighty, Timothy, as always, a wide-ranging, wild conversation about the cutting edge of what’s happening now.
Timothy: Thank you for having me.
Jim: It was fun. We’re gonna have you back sometime this summer. Talk about something else. I don’t remember what, but it’ll be fun.
Timothy: Sounds good. Take care.