Subscribe: Spotify | RSS | More
Jim talks with Nate Soares about the ideas in his and Eliezer Yudkowsky’s book If Anybody Builds It, Everybody Dies: Why Superhuman AI Would Kill Us All. They discuss the book’s claim that mitigating existential AI risk should be a top global priority, the idea that LLMs are grown, the opacity of deep learning networks, the Golden Gate activation vector, whether our understanding of deep learning networks might improve enough to prevent catastrophe, goodness as a narrow target, the alignment problem, the problem of pointing minds, whether LLMs are just stochastic parrots, why predicting a corpus often requires more mental machinery than creating a corpus, depth & generalization of skills, wanting as an effective strategy, goal orientation, limitations of training goal pursuit, transient limitations of current AI, protein folding and AlphaFold, the riskiness of automating alignment research, the correlation between capability and more coherent drives, why the authors anchored their argument on transformers & LLMs, the inversion of Moravec’s paradox, the geopolitical multipolar trap, making world leaders aware of the issues, a treaty to ban the race to superintelligence, the specific terms of the proposed treaty, a comparison with banning uranium enrichment, why Jim tentatively thinks this proposal is a mistake, a priesthood of the power supply, whether attention is a zero-sum game, and much more.
- Episode Transcript
- “Psyop or Insanity or …? Peter Thiel, the Antichrist, and Our Collapsing Epistemic Commons,” by Jim Rutt
- “On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback,” by Marcus Williams et al.
- Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin,” by Enrique Queipo-de-Llano et al.
- JRS EP 217 – Ben Goertzel on a New Framework for AGI
- “A Tentative Draft of a Treaty, With Annotations”
Nate Soares is the President of the Machine Intelligence Research Institute. He has been working in the field for over a decade, after previous experience at Microsoft and Google. Soares is the author of a large body of technical and semi-technical writing on AI alignment, including foundational work on value learning, decision theory, and power-seeking incentives in smarter-than-human AIs.