Search: domain:youtube.com #ai

2 posts

2026-04-14

3144Δ40m Technical

Demis Hassabis: Why AGI is Bigger than the Industrial Revolution & Where Are The Bottlenecks in AI

Summary

In this wide-ranging conversation, Demis Hassabis, co-founder and CEO of Google DeepMind, explores the current state of Artificial Intelligence, the trajectory toward Artificial General Intelligence (AGI), and the profound implications these technologies hold for humanity. Hassabis, widely regarded as one of the most significant scientific minds of the modern era, provides a detailed roadmap for the next decade of AI development.

The Definition and Timeline of AGI

Hassabis defines AGI as a system capable of exhibiting all cognitive capabilities of the human mind. He maintains a consistent timeline that he and his co-founders established in 2010, predicting that AGI is likely to be achieved within the next five years. He notes that while "scaling laws"—the principle that increasing compute and parameters leads to greater intelligence—are seeing slightly diminishing returns compared to the initial exponential jumps, they have not plateaued. Compute remains the primary bottleneck, serving not just as a resource for scaling but as a "workbench" for necessary algorithmic experimentation.

Technical Frontiers and "Jagged Intelligence"

Despite rapid progress in video models and interactive world models (such as DeepMind’s Genie), Hassabis identifies several critical missing components in current AI:

Continual Learning: Current systems do not learn after their training phase; Hassabis suggests the need for "consolidation" mechanisms similar to human sleep.
Memory Architectures: Moving beyond "brute force" long context windows to more elegant memory systems.
Long-term Planning: Developing hierarchical planning capabilities that span years.
Consistency: Overcoming "jagged intelligence," where a model excels at a task in one format but fails at elementary logic when the prompt is slightly repositioned.

The Scientific and Medical Revolution

Hassabis views AGI primarily as the ultimate tool for scientific discovery. Following the success of AlphaFold, his company Isomorphic Labs is working to solve the entire drug discovery process, from chemistry to toxicity. He envisions a "Golden Age" where AI simulates human metabolism to accelerate clinical trials and eventually moves the regulatory needle to eliminate the need for animal testing. His personal motivation includes finding cures for complex conditions like Multiple Sclerosis and eventually "curing cancer" through a general-purpose drug design platform.

Economic Impact: The 10x Industrial Revolution

Hassabis quantifies the coming of AGI as "10 times the Industrial Revolution at 10 times the speed." He acknowledges the inevitability of labor market disruption but argues that, historically, technology creates higher-quality, higher-paying jobs. To mitigate wealth inequality, he suggests that sovereign wealth funds and pension funds must invest early in AI. Furthermore, he posits that AI will solve its own energy crisis by optimizing national grids (increasing efficiency by 30-40%) and facilitating breakthroughs in fusion energy and material science (e.g., superconductors).

Global Safety and Regulation

Addressing the "existential risk" and the potential for misuse by bad actors, Hassabis advocates for an international regulatory body similar to the International Atomic Energy Agency. He emphasizes the need for technical benchmarks to test for "undesirable properties" like deception. He stresses that as systems become more autonomous and agentic, they must have independent "kite marks" of quality and safety before being deployed.

The European Tech Ecosystem

Hassabis remains committed to London, citing the UK’s rich scientific heritage (from Newton to Turing) and the high density of world-class talent at universities like Oxford and Cambridge. He argues that being "away from the maelstrom" of Silicon Valley allows for deeper, more original thinking. However, he identifies a lack of late-stage growth capital as the primary barrier preventing Europe from producing trillion-dollar companies.

Philosophical Legacy

Ultimately, Hassabis hopes to be remembered for advancing the frontiers of knowledge and curing diseases. Beyond the technical and economic challenges, he expresses a growing concern for the philosophical questions of the AGI era: the nature of consciousness, the definition of human purpose, and the meaning of life in a world where intelligence is no longer a human monopoly.

Transcript

Demis Hassabis: I would say about 90% of the breakthroughs that underpin the modern AI industry were done either by Google Brain or Google Research or DeepMind. So, one of our groups... the returns are kind of still very substantial, although they're a bit less than they were obviously at the start of all of this scaling.

We have amazing guests on the show, but very few honestly will be considered in the same realm as Newton, Turing, Einstein. Our guest today is one of the greatest minds on the planet and I consider myself incredibly lucky to have had the chance to sit down with him.

Those labs that have the capability to invent new algorithmic ideas are going to start having a bigger advantage over the next few years as the last set of ideas—all the juice is being wrung out of them. This is a truly special one and one that I'll remember for a very long time. I think we could probably get 30–40% more efficiency out of our national grids. Enjoy the episode, and I so appreciate the time we had with a very special human being. I sometimes quantify the coming of AGI as 10 times the Industrial Revolution at 10 times the speed. Thrilled to welcome Demis Hassabis of DeepMind. Ready to go.

Interviewer: Demis, I'm so excited to be doing this. Thank you so much for joining me today.

Demis Hassabis: Great to be here.

Interviewer: Now, there are many places that we could have started, but I was watching actually the documentary that you did, which was fantastic, and I actually wanted to start on AGI. Definitions are very varying. You've been very thoughtful about what it means to you. And so I wanted to start: can you explain to me how you think about it today so we get that as a kind of ground center?

Demis Hassabis: Yeah. Well, we've always been very consistent in how we define AGI as basically a system that exhibits all the cognitive capabilities the human mind has. And that's important because the brain is the only existence proof we have that we know of—maybe in the universe—that general intelligence is possible. So that for me is the bar for what AGI should be.

Interviewer: It's the worst question: how close are we? Everyone says different things, and it's very difficult when you have very prominent figures saying it could be as early as 2026 or 2027.

Demis Hassabis: Yeah, I mean, I think look, I've got a probability distribution around the timings, but I would say there's a very good chance of it being within the next five years. So that's not long at all.

Interviewer: Is that closer than you thought? Has that changed over time?

Demis Hassabis: Not really. I mean actually, it's funny—my co-founder Shane Legg, who's Chief Scientist here, when we started out DeepMind back in 2010, he used to write blog posts sort of predicting when AGI would happen. And bearing in mind in 2010 when we started, almost nobody was working in AI and everyone thought it was a dead end. But they're still there on the internet for people to check. And we used to do this extrapolation of compute and algorithmic progress. And basically, we predicted around 20 years it would take from when we started out, and I think we're pretty much on track.

Interviewer: What are the biggest bottlenecks when you look today? You know, in the documentary you said you just never have enough compute. What are the biggest bottlenecks when you look at where we are today?

Demis Hassabis: I think compute is the big one. Not just for the obvious reason of scaling up your ideas and your systems as the "scaling laws," as they're called, keep on building bigger and bigger architectures with more and more parameters. And as you do that, you get more intelligent systems. But the other thing you need a lot of compute for is for doing experiments. The cloud is our workbench, basically. So if you have a new algorithmic idea but you want to test it, you've got to test it at a reasonable scale, otherwise it won't hold when you actually put it into the main system. So you need quite a lot of compute if you have a lot of researchers with lots of new ideas.

Interviewer: You mentioned the word "scaling laws." A lot of people suggest that we're hitting scaling laws and we're starting to see that plateauing effect. Do you think that's true?

Demis Hassabis: No, I don't think so. I think it's a bit more nuanced than that. So of course, when the leading companies all started building these large language models, you're getting enormous jumps with each generation of new system. You know, maybe they're almost doubling in performance. At some point that had to slow down. So it's not continuing to be exponential, but that doesn't mean there isn't great returns still for scaling the existing systems up further. And we and the other frontier labs are getting a lot of great returns on that kind of compute expansion. So, I would say the returns are still very substantial, although they're a bit less than they were obviously at the start of all of this scaling.

Interviewer: Where are we behind where you thought we would be?

Demis Hassabis: I think actually in most areas we are ahead of where I thought we would be. If you think about things like the video models or even now with our newest systems like Genie—they're interactive world models—which I think is kind of incredible if you sort of step back and think about it. I think if you'd shown me that 5 or 10 years ago, I would have been pretty amazed. So I think in most domains we are ahead of where the field thought.

There's still some big things missing though, like continual learning. These systems don't learn after you finish training them, after you put them out into the world. They're not very good at learning further things.

Interviewer: I'm sorry to ask blunt and basic questions. Why do we not have continuous learning today?

Demis Hassabis: Well, people haven't quite figured out yet—and all the leading labs are working on this—how to integrate new learning into the existing systems that you spent months training. Of course, the brain does this very elegantly, right? Probably through things like sleep and reinforcement learning. You just kind of get "consolidation," as it’s called in the brain, where your memories during the day are replayed and then some of that information is elegantly incorporated into your existing knowledge base. Perhaps we need something like that to incorporate new information along with the existing information base.

Interviewer: You mentioned video models, you mentioned kind of media and image. It seems that DeepMind has progressed very quickly and caught up or overtaken other providers. I basically tweeted what I used and how it's changed over time, and DeepMind now is my number one for research for new shows. It wasn't that way before. What has led to the acceleration and progression of DeepMind in a way that it wasn't maybe there two to three years ago?

Demis Hassabis: Yeah. Well, we made some organizational changes. I think we've always had the deepest and broadest research bench at Google and at DeepMind. I mean, if you look at the last decade plus, I would say about 90% of the breakthroughs that underpin the modern AI industry were done either by Google Brain or Google Research or DeepMind. If you think of things like AlphaGo and reinforcement learning and of course Transformers—these are all the key breakthroughs. So I would back us to make those breakthroughs in the future if there are any missing ones.

I think we've basically helped put together all the talent from around the company sort of pushing in one direction. And then we talked earlier just about compute resources—it was also about combining all of our resources together so we could build the biggest models rather than having two or three versions around the company. So I think a lot of it was assembling together all the ingredients we already had and then kind of pushing with relentless focus and pace—acting almost like a startup, really—to get back to the frontier and be ahead in many areas.

Interviewer: You say if anyone's going to do the breakthrough it could and should be us. When you think about that, is continuous learning the next breakthrough that you're most excited by?

Demis Hassabis: I think there's quite a few things that are missing. There's continual learning. I think there's a lot of mileage in looking at different memory systems. At the moment we have these long context windows which are kind of a bit brute force. You just put everything in them. I think there's a lot of interesting architectures to be invented there.

And then there's stuff like long-term planning, hierarchical planning. These systems are not very good at planning at long time horizons, many years into the future, which we with our minds can do. So there's quite a lot of problems I think that are still left to overcome. Maybe one of the biggest is consistency. I sometimes call these systems "jagged intelligences" because they're really amazing at certain things when you pose the question in a certain way, but if you pose a question in a slightly different way they can actually still fail at quite elementary things. So a general intelligence shouldn't be that sort of jagged.

Interviewer: When you reposition files and you set up agents to perform in certain ways and then the files fall over, or the configuration completely falls over...

Demis Hassabis: Exactly. 100%.

Interviewer: That's a disaster.

Demis Hassabis: Yeah. Well, I mean, the general intelligence—if you think about how our minds work—it shouldn't have those kinds of holes in it.

Interviewer: We said about a plateauing of scaling laws. Everyone talks about a commoditization of models in terms of capabilities. Do you think we see that, or do you think we see one to two continuously accelerate ahead of the others?

Demis Hassabis: Yeah, I feel like maybe the three or four leading labs now, of which we're one, I think the gap is starting to pull away because a lot of these tools also of course help you build the next generation. So things like coding tools, math tools... and it's getting harder and harder I would say to eke out the same gains from just the same ideas. So I think those labs that have the capability to invent new algorithmic ideas are going to start having a bigger advantage over the next few years as the last set of ideas are sort of having all the juice being wrung out of them.

Interviewer: I mean, you know, you were very open with a lot of your research for years and we see many very good quality open models. How do you think about the future of open? I have many portfolio companies that kind of use frontier models to set a benchmark and then they use open models to get as close as possible but with more cost effectiveness. What does that future look like?

Demis Hassabis: Yeah, I think it's probably similar to what we're seeing today. I mean we're big supporters of open science and open models and we've done many, many things obviously from the original Transformers to AlphaFold—these are all things we sort of gave out into the world to help the research community, and we plan to continue to do that especially in applied domains, scientific domains, applying AI to science which is obviously my passion.

But I think increasingly what you're going to see is the open source models probably one step back from the absolute frontier. It usually takes about six months for the open source community to sort of reimplement and figure out what those ideas are. But we are also pushing hard on a suite of open source models called Gemma which we're determined to make best-in-class for their sizes. Specifically for small developers or academics or the beginnings of a startup, I think they're perfect for that and also for edge computing too. So we're very interested in open source models for certain types of applications.

Interviewer: How do you think about a world post-LLMs? You have different people with different views. You have Yann LeCun with very different views.

Demis Hassabis: For me, I don't think it's... I kind of disagree with Yann on a few things. I think there might be a 50/50 chance there's some things maybe missing that we still need to make breakthroughs in—perhaps they're world models or these kinds of approaches. But my betting is pretty strong: we've seen how successful these foundation models have been. They can do incredibly impressive things. I don't think that's going to go away. We're still seeing gains from the returns from the scaling laws. So I think the only question really is when you think about a future AGI system: is an LLM foundation model going to be the key component only, or is it the total system? I just think it's a question of is there anything else needed. I don't think it's going to get replaced; I think it's going to get built on top of these foundation models just like the way we do with our world models.

Interviewer: When we think about that future five years out as you said, potentially with AGI, what does that world look like? Many people have different concerns. If we just start generally, what does that world look like to you?

Demis Hassabis: I think on the positive side—and the things obviously I've spent my whole career and life building towards AGI—is I think it will be the ultimate tool for science and medicine. So in terms of advancing scientific discovery, finding cures to diseases, I think we need that kind of technology. And so I'm hoping in five years plus time we'll be sort of entering a new golden era, a golden age of scientific discovery.

Interviewer: So, my mother's got multiple sclerosis. So it's the thing that I'm always most excited about. The thing I worry about is actually kind of drug discovery—the process of getting it through all the trials and knowing that it takes a decade before my mother will actually get any benefits from it. How do we solve that?

Demis Hassabis: I think we'll get to that point soon. First of all, what we're doing is, after we did the AlphaFold project to do protein folding, then we spun out a company called Isomorphic Labs, which is doing extremely well. And that is supposed to focus on solving the rest of the drug discovery process, which is a lot of chemistry, designing the compounds, checking it's not toxic and all the different properties you need for drugs to be safe. I think we'll have that whole drug design engine ready in the next 5 to 10 years.

Then you're right: the next problem is the clinical trials still take many, many years. But I think AI can help there in terms of maybe simulating parts of the human metabolism. Also stratifying patients to make sure that certain patients get exactly the right type of drug that's suitable for their genomic makeup. And so I think AI can help there too. But I think the real revolution will come when a few, maybe a dozen or so AI drugs get through the whole process and then the government and the regulatory bodies see that and they have enough data to sort of back-test the predictions of those models. Then maybe what we can do in the future—where maybe another 10 years after that—is where we can really just trust the predictions that the models are making and actually then maybe skip out some steps. Perhaps animal testing is not needed anymore. Maybe we can go up the dosage ladder quicker because you can rely on these models. So I think we've got to do it in two steps: solve the drug design problem first and then look at the regulatory length of time it takes.

Interviewer: Speaking of regulatory, AI safety is a big topic and a big concern. I think it was... again I watched it last night over dinner which was a great watch which is obviously the documentary... and I think it was Stephen Hawking who said, "We must get it right because we might not get another chance." Do you think that's right?

Demis Hassabis: Yeah, I do think that's right. I think that is the stakes that we have to deal with. And you know, there's two things I worry about. One is the misuse of these systems by bad actors, and they can be repurposed. These are dual-purpose technologies. They can be used for incredible good in science and health as we've just discussed, but they can also be repurposed for harmful ends by a bad actor. So that's one issue.

Second issue is a technical one: making sure these systems as they get more powerful—not today's systems, but maybe in a year or two's time when they become more agentic, more autonomous as we get towards AGI—can they be kept on the guardrails that we want? And I think regulation, the right kind of regulation, could help here in terms of making sure there's at least sort of minimum standards from all of the leading providers, but it needs to ideally be a kind of international standards.

Interviewer: What is the right kind of regulation? And again, I'm kind of quoting yourself back from this documentary. You're like, "I think we need more global coordination," which worries me because we're getting worse at it.

Demis Hassabis: Yes, for sure. I mean, it's sort of crazy the timing that we're in, right? With this most consequential maybe technology the world's ever seen at the same time as a very fragmented sort of international system. It's not ideal, but I think we're going to have to try and do the best we can to at least come up with a sort of set of minimum standards, some benchmarks that test for undesirable properties. For example, deception. Nobody wants to be building systems that are capable of deception because then they could be getting around other safeguards. And then I imagine, if things go well, some kind of certification process that basically—it's almost like a kite mark of quality—that this model has certain safeguards and certain guarantees, and so therefore consumers and companies can safely sort of build on top of it. I think that is how it should go ideally. But it does have to be international because of course these systems are cross-border and they're cross-territory.

Interviewer: Who is that ultimate verification system? You obviously started with Theme Park. Brilliant. Don't put the burgers down too close to the roller coaster. But you know, obviously as a media company, I go through any media platform saying I don't know what's real or fake. I'm always having to ask what's real or fake. Who is that arbiter of verification?

Demis Hassabis: Well, I think there—ultimately it's got to be government, I think. But the kind of technical bodies that would be able to do the technical work would be like maybe the AI safety institutes. There's a very good one in the UK that was set up under Prime Minister Sunak and I think is doing great work, and there's one in the US. Maybe some of the leading countries that have the best research should also have an equivalent body that is staffed with high-quality researchers too, that can actually evaluate and audit these kinds of systems against certain benchmarks and independently check whether they are meeting the right standards.

Interviewer: If I could give you like a magic wand that was only applicable to AI safety, what would be your implementation idea or program that you would put in place?

Demis Hassabis: Yeah, I think we need some kind of international body, maybe similar to the Atomic Energy Agency, something like that, that perhaps the AI safety institutes sort of feed into. And the research community has to also be involved in this: what are the right set of benchmarks to check? What types of traits? What types of capabilities? Maybe there are other safeguards too like... it wouldn't be desirable to have AI systems output tokens that are not human-readable. So, in some kind of machine language that we couldn't understand. I think that would introduce a new vulnerability. So there's quite a few sort of things like that which I think most of the leading labs would agree are probably not best to do. And then these bodies would test against those things. I think that would give the public confidence and academia could be involved as well, as well as civil society, that these systems which are going to get incredibly powerful have been independently checked and audited.

Interviewer: That's it. Your magic wand's done now. That was the one.

Demis Hassabis: Maybe I used it on the wrong thing!

Interviewer: Time will tell.

Demis Hassabis: Yes. Exactly.

Interviewer: You said there about science being one of the most exciting areas in five years' time. I have to ask it because it's one of the biggest concerns: the labor displacement problem. I just had Marc Andreessen on the show actually and he said that I was a Marxist for bringing it up. Marc's wonderful so I'm not blaming him, but he was like it's completely rubbish. I don't agree with it at all; we've always overcome it. How do you think about the labor displacement problem when you look at how truly capable these systems are and what that does to labor markets?

Demis Hassabis: Well, certainly in the past with every new revolutionary technology there's been a lot of job disruption. So that's for sure, and I think that's definitely going to happen. So a lot of old jobs go away or are not viable anymore, but then actually the history of it is that a whole set of new jobs arrive that maybe one can't even imagine before, and those are high-quality and higher-paying. So that's the normal course.

Of course, you have to be very careful to say "this time is different," and I guess that's what people like Marc are claiming—it's the same as the last sort of 10 massive breakthroughs like the internet, mobile, and so on. I do think this is going to be bigger than all of those previous technological breakthroughs. I mean, I sometimes quantify AGI—the coming of AGI—as like 10 times the Industrial Revolution at 10 times the speed. So unfolding over a decade instead of a century. If you read a lot about the Industrial Revolution—there's a lot of great books about it—it caused a huge amount of upheaval as well as a lot of advances. I mean, we wouldn't have modern medicine today. Child mortality was at 40% pre-Industrial Revolution. So you wouldn't want it not to have happened, but ideally this time around we mitigate some of the downsides a bit better than we did during the Industrial Revolution.

Interviewer: I often listen to amazing voices like yours and I get very excited by how fast it's coming. And then I try and stop myself from being too useful and think I should be more wise... and I'm told that you know we always overestimate what can be done in a year and underestimate what can be done in ten. Is that the truth here?

Demis Hassabis: No, I think that's still the truth. I mean, maybe both timescales of short-term and long-term are nearer than other technologies. But I do think literally today, as of today and in the next year, things are a bit overhyped in AI. I mean, there couldn't be any more hype in some ways. But on the other hand, interestingly, I still think it's very underappreciated how revolutionary this is going to be in the timescale of about 10 years. So we could call that long term. There's still that dichotomy even today with AI.

Interviewer: With the concern around labor markets, there's also a concern around income inequality and the concentration of wealth to few players. How do you see that shaping out with the comment on the Industrial Revolution?

Demis Hassabis: Well, I think there's different ways that could play out. You know, maybe pension funds should be buying into all the big AI companies and making sure that everyone has a piece of that. Or sovereign funds—maybe every country should have a sovereign wealth fund that does that. That would be the sort of investment way of doing it. I think also there needs to be thought about: if there is this massive productivity gain but it's sort of narrow where that occurs, how do we redistribute that so that everyone benefits from these huge gains?

I can see all sorts of ways that could be done including providing infrastructure and other things with that additional productivity gain. I mean there could be unbelievable things happening in the 5 to 10 year timescale including like a breakthrough in some kind of renewable free energy. You know, maybe we solve fusion. We're working on that, right, with our partners at Commonwealth Fusion. I think AI is going to usher in... maybe we have amazing new superconductors, better batteries, material science. There's all sorts of ways I could see that completely changing the nature of the economy.

Interviewer: How do we solve the energy crisis that comes with an AI revolution? What it means in terms of energy requirements is unprecedented. I know it's an incredibly hard question, but how do we solve that unprecedented need for new energy?

Demis Hassabis: Well, I think actually AI will in the medium to long run more than pay for itself in terms of energy costs. So, we work on all these projects of optimizing existing infrastructure like optimizing the grid. I think we could probably get 30–40% more efficiency out of our national grids. And then there's like modeling the climate and weather—we have the best kind of weather modeling systems in the world. So that helps us work out where the effects are really happening to mitigate that.

And then finally, the most exciting maybe is like these new breakthrough technologies like fusion, new batteries, superconductors that I think AI will be essential for helping us reach. Then I think we'll be in a completely new energy situation than we've ever been as humanity. And then that will of course help with things like the climate and environment and eventually also help us get into space much more cheaply because if you have an incredible energy source like fusion, then you have effectively unlimited rocket fuel because you can just distill/catalyze seawater.

Interviewer: I'm not going to ask you to solve space, don't worry. My question was on being in the UK. You're in London. I'm in London. I'm very proud to be in the UK. You have been, I'm sure, pushed or prodded at every turn to move to the US. Why have you stayed?

Demis Hassabis: Well, I should ask you that question, too! But I think I saw in London when we started DeepMind a place that—and the UK in general and Europe to some degree—there's incredible talent here. We've always had three or four of the top 10 universities in the world with Cambridge, Oxford, Imperial, or UCL. So we're producing the envy of the world, really—these amazing graduates and PhD students. We have incredible scientists here. We've got a rich heritage of that all the way from Turing and Hawking and Darwin, Newton. So we have this incredible history of scientific breakthroughs and having great thinkers.

I felt we had all the ingredients and the talent and great engineers here, but it just hadn't been galvanized into an ambitious deep-tech startup idea. And I felt it was possible and I felt that there was actually less competition here for that sort of talent and we could even draw in the best talent from the top European universities—and that's what it was like in the early days of DeepMind. So I think it was a huge structural advantage for us.

And then the final thing is maybe being a bit away from the Valley. There is some disadvantage in that you're not plugged into the network and the gossip and the latest trends and vibes and all these things. We're a little bit out of it here, but I think it's very conducive to thinking deeply about things, being more original about how you think. And I think that's great for things like deep tech where you don't want to be distracted by the latest fad. You want to... you know it's going to be a 20-year mission, which is what we knew at the beginning of DeepMind. So I think being a little bit away from that maelstrom is quite good.

Interviewer: Palmer Luckey often talks about being 400 miles away from the Valley. It's core to his kind of innovative thinking. Terrible question: will Europe have a trillion-dollar company? You know, you see the Americans always bash us for our lack of large companies. I ping Daniel Ek and be like, "Come on, dude," but we don't have a trillion-dollar company.

Demis Hassabis: Not yet. I mean, Daniel may well get there with one of his companies. Spotify, Helseing—I think those are two good options. I think there's no reason why we can't have that. I'm going to try and do that with Isomorphic, which is headquartered here and I think has the potential to be that. But I think that's one of the disadvantages of Europe—obviously we're a combination of smaller markets. So that's one thing we have to kind of overcome. Maybe this "EU Inc" thing could be a good innovation.

Interviewer: I'm pulling out the magic wand again. This time applied to European technology. What would you do to implement a growth mindset and an ability to build that trillion-dollar company that we don't have today?

Demis Hassabis: I think in the UK—and this may apply to other European countries too—I think unlocking what pension funds can invest in. For the growth stage, I think we're brilliant at doing the startup idea and getting it to a certain level like we did with DeepMind. But then if you really want to cross that sort of chasm into the trillion-dollar global player, then where are the billion-dollar rounds going to come from where you can really take on the existing incumbents? I think that certainly was missing 10 years ago when I was doing fundraising for DeepMind, and I think it's still kind of missing today—just that level of ambition and the amount the capital markets can support.

Interviewer: I read about some of your early rounds raising in the Silicon Valley from families. Okay, we're going to do a quickfire round. Meeting Elon for the first time—how was that?

Demis Hassabis: Oh yeah, it was amazing. It was at a Founders Fund meeting because we were both... SpaceX and DeepMind were part of the same portfolio, a kind of amazing portfolio that Peter Thiel had at Founders Fund. I think we were both invited to my first portfolio conference, I think it must have been back in 2011 or 2012, very early days. So we were the small little upcoming thing and I had a small speaking slot, and then Elon was the big thing in that portfolio. So he had the keynote, but then we met afterwards. I think it was... Elon says it was like we were passing each other in the bathroom or something! And we said hi and we both hit it off immediately as people that were almost too ambitious in their thinking, perhaps, and love sci-fi. I really wanted to visit his rocket factory, so I was trying to get an invitation to SpaceX in LA and he invited me at the end of that meeting.

Interviewer: Healthcare revolution or disease eradication that you're most excited about? Again, for me it's specifically with multiple sclerosis.

Demis Hassabis: Yeah. Well, look, I want to literally cure cancer. I know people say that's the cliché, but actually what we're building at Isomorphic is general purpose. So we're trying to build a drug design platform that will be applicable to any therapeutic area. So ideally it will help with everything from neurodegeneration, cardiovascular, immunology, to cancer. Those are the ones we're focusing on first, but eventually it should be applicable to every disease area.

Interviewer: What are you thinking about that you're not reading about or seeing anyone talk about?

Demis Hassabis: I think a lot of people are worrying about the economic questions around AGI that we talked about earlier, but I worry a lot about the philosophical questions around it. Let's assume we get the technical right, let's assume we get the economics part of it right—both of those are hard. Then there's a philosophical question of: what is meaning? What is purpose? We'll find out maybe what consciousness is... what does it mean to be human? I think that's what's coming down the road and I think we need some great new philosophers to help us navigate that.

Interviewer: Hard final question. There are many different ways you could describe what you do. What would you most like to be remembered for? What do you want your legacy to be?

Demis Hassabis: I would like my legacy to be remembered for advancing science and building technologies that bring incredible benefits into the world, like curing terrible diseases.

Interviewer: Demis, thank you so much for putting up with my meandering conversation. You've been fantastic. I really appreciate it.

Demis Hassabis: Thank you very much.

🌐 youtube.com, ai, deepmind, hassabis, video

2026-03-16

2997Δ22m Academic

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

www.youtube.com/watch?v=yUmDRxV0krg

Summary

In this insightful presentation at the Harvard Center for Mathematical Sciences and Applications (CMSA), Yann LeCun, Chief AI Scientist at Meta and Turing Award laureate, outlines a roadmap for the next generation of Artificial Intelligence. He argues that current Large Language Model (LLM) architectures are fundamentally limited and proposes a shift toward "World Models" and Joint Embedding Predictive Architectures (JEPA).

The Limitations of Current AI Architectures

LeCun begins by highlighting the stark contrast between human/animal learning and current machine learning. Despite the success of LLMs, he identifies several critical flaws:

Data Inefficiency: LLMs require trillions of tokens—equivalent to hundreds of thousands of years of reading—to reach their current level. In contrast, a four-year-old child has processed a similar amount of data (roughly $10^{14}$ bytes) through visual observation, yet possesses a far superior understanding of the physical world.
Autoregressive Failures: Current models predict the next token in a sequence. This process is inherently divergent; errors accumulate exponentially, leading to "hallucinations" and a lack of logical consistency.
Lack of Physical Grounding: LLMs lack a "mental model" of reality. They cannot reason about gravity, inertia, or the outcomes of physical actions, which are concepts human infants grasp within the first months of life.
Fixed Computation: Standard neural networks use the same amount of computation for every token, whereas complex problems should require more "thinking time"—a distinction between "System 1" (instinctive) and "System 2" (deliberative) cognition.

The World Model and JEPA

The core of LeCun’s proposal is the Joint Embedding Predictive Architecture (JEPA). He argues against generative models that attempt to predict every pixel in a video, noting that most details (like the movement of leaves on a tree) are irrelevant and unpredictable.

Representation over Generation: Instead of reconstructing pixels, JEPA predicts the representation of the next state in an abstract space. This allows the system to ignore unpredictable noise while capturing essential structures.
Hierarchical Abstraction: Just as science uses different levels of abstraction (from quantum mechanics to cells to ecosystems), AI must learn a hierarchy of representations. This is essential for Hierarchical Planning—the ability to break a long-term goal (e.g., traveling from New York to Paris) into a series of sub-goals and specific muscle movements.

Energy-Based Models and Optimization

LeCun advocates for moving beyond simple feed-forward propagation toward Inference by Optimization. He describes an "Energy-Based Model" (EBM) where an energy function measures the incompatibility between an input and a potential output.

Inference as Search: Under this framework, the system does not just "blurting out" an answer; it searches for an output that minimizes energy (maximizes compatibility with the world model and task objectives).
Preventing Collapse: A major technical challenge in non-generative models is "collapse," where the system learns a trivial constant representation. LeCun discusses regularized methods (like VICReg and Dino) that prevent collapse by maximizing the information content in the representation space.

Practical Applications and Results

The talk highlights recent breakthroughs from Meta’s Fundamental AI Research (FAIR) lab:

Dino-v2: A self-supervised vision model that matches or surpasses supervised systems in image understanding using far less labeled data.
V-JEPA: A video-based model that learns intuitive physics and common sense by observing unlabelled video. It can detect "impossible" events, such as an object disappearing, by noting spikes in prediction error.
Robotic Planning: Demonstrations show how these world models allow robots to plan complex tasks (like navigation or object manipulation) "zero-shot," without specific reinforcement learning for every new task.

Future Directions: A Shift in Paradigm

LeCun concludes with several strategic recommendations for the AI research community:

Abandon Generative Models: Focus on JEPA for non-discrete signals like video and sensory data.
Use Regularized Methods: Move away from contrastive learning (which requires too many negative samples) toward methods that regularize representation volume.
Minimize Reinforcement Learning: RL is highly inefficient; instead, utilize world models to plan actions through optimization.
Objective-Driven AI: Build systems where behavior is dictated by hard-coded guardrails and task-specific cost functions, ensuring safety and controllability.

Transcript

Introduction

Mike Friedman: Welcome everyone. Can you hear me? I'm Mike Friedman, representing the Center for Mathematics and Scientific Applications at Harvard, and it's my great pleasure to be introducing Yann LeCun, Chief Scientist at Meta. We're running a conference at CMSA on the geometry of machine learning, and this is actually a lecture within that conference, but it's outside the CMSA building because we knew too many people would show up to hear Yann. So we were able to move it to the Science Center where it's appropriate.

As soon as we got Yann to agree to give this talk, all the other speakers accepted immediately. So thank you, Yann. It's the easiest conference to organize. Yann is one of these scientists that it would anesthetize the audience if I tried to go through his awards, and also I would need a script. So I'll just mention that he won the Turing Award with Bengio and Hinton a few years ago. I think of him interchangeably with the idea of convolutional neural nets. I'm a geometer, as a mathematician—you know, topologist and geometer—and I think that's something we share: a confidence in the geometric imagination. I know it's something that Yann has always tried to figure out how to weave into artificial intelligence, and it's a vein of exploration that I've greatly admired. So, I think we're all very much looking forward to this talk. So am I. And without further ado, let me turn the stage over to Yann.

Yann LeCun: Thank you so much. Well, I have a terrible confession to make, which is that I'm not a mathematician. I'm not really a computer scientist either. I never actually studied computer science. So I'm not exactly sure what I am, but I'm going to talk about machine learning. I was told this was a bit of a more general audience than the one at the workshop, so I made this a bit more of a wide-audience talk—still technical, but a little lightweight on the theory, that's for sure.

The Current State of AI and the Need for Better Learning

I want to talk about the future of AI and how we can make significant progress towards more intelligent machines beyond what they are currently capable of doing. And I tell you right now, there is a lot of work to do. We're nowhere near matching human intelligence or even animal intelligence with the type of techniques that we have access to at the moment.

So one big question we can ask ourselves is: do we actually need AI systems with human-level intelligence? And the answer is probably yes, because the future in which each of us walks around with AI assistants helping us in our daily lives at all times—perhaps in wearable devices like smart glasses like the ones I'm wearing at the moment—is coming. We'll be their boss. It's kind of like we'd be running around with a team of virtual people helping us at all times. And of course, for this, we need AI systems that have intelligence that is in some way similar to humans, because that's the kind of entity that we are most familiar with interacting with.

But the technology is nowhere near where it needs to be at the moment for that. The main issue is that current AI architectures and machine learning techniques suck compared to what we can observe in humans and animals. The type of efficiency in learning that we see in animals and humans is just astonishing, and we're not matching this at the moment in many instances.

Early on in machine learning, the main technique was supervised learning, and then there was a big fashion around reinforcement learning for a while. Now it's used a lot, of course, to fine-tune large models, but in themselves, those two techniques are really insufficient. The type of learning that we observe in humans and animals is very different. It's neither supervised nor reinforced for that matter. It's more like self-supervised learning, something that has really revolutionized AI and machine learning over the last few years. The underlying principles are very similar to supervised learning, but there is no clear difference between input and output.

This works astonishingly well for training a system to understand the structure of sequences of discrete symbols such as language, code, and mathematics. But the problem is that it only works for sequences of discrete symbols. It doesn't really work for natural signals yet. Self-supervised learning is starting to work there, but the techniques are very different, and that'll be the main topic of this talk.

There are other limitations with current AI architectures. The type of inference that they perform is basically feed-forward propagation through a fixed number of layers. That's computationally limited. There's a lot of functions you cannot represent efficiently by just stacking a fixed number of layers. Also, current architectures use autoregressive prediction. They use their own predictions as input to make further predictions, and that leads to divergence or "hallucination," as people call it.

The World Model Concept

Humans and animals have mental models of the world. Their behavior is driven by objectives, tasks, and goals. They can reason and plan complex action sequences—all things that chatbots and LLMs are essentially incapable of, or at least not at the level we'd like. We need systems that understand the physical world, have persistent memory, can plan complex actions, can reason (spending more time on difficult problems), and are controllable and safe.

Let's start with this idea of a "World Model." We have mental models of reality that allow us to predict what's going to happen, particularly as a consequence of our actions. This allows us to plan. This chart indicates at what age infants learn basic concepts, like object permanence—knowing objects don't just disappear—and category recognition. By nine months, infants learn basic intuitive physics like gravity, inertia, and conservation of momentum. If you show a six-month-old a cart pushed off a platform that appears to float, they won't pay much attention. A ten-month-old will be extremely surprised, because by then they've learned that objects are supposed to fall.

How do we get machines to learn like babies? We haven't solved that problem. We don't have domestic robots. We don't have level-five self-driving cars. We have systems that can pass the bar exam or solve math problems, but we don't have robots that can do what a cat can do or what a ten-year-old can do the first time they are told to clear a table. A 17-year-old can learn to drive in 20 hours without causing accidents, while we have millions of hours of training data and still don't have fully autonomous cars without specialized sensors and mapping. This is the Moravec’s paradox: things that are intellectually challenging for humans (chess, integrals) are algorithmically simple, while things that are easy for humans (dexterity, common sense) are incredibly difficult for AI.

The Data Efficiency Gap

A typical large language model is trained on something like 30 trillion tokens (Llama 3). That's about $10^{14}$ bytes. It would take a human half a million years to read that. Compare this to a four-year-old child. A child has seen about 16,000 hours of "video" through their eyes. The optic nerve carries about 2 megabytes per second. Over 16,000 hours, that’s also about $10^{14}$ bytes.

A four-year-old has seen as much data as the biggest LLMs have read. Visual data is redundant, and that's exactly what you want for self-supervised learning. You need redundancy to learn structure. This tells us two things: first, we're never going to get to human-level AI by just training on text. It’s just not going to happen. Second, we need serious progress if we want useful robots. Current humanoid robots are impressive in videos, but they aren't smart enough to be useful except in narrow, carefully trained tasks.

Inference by Optimization

I mentioned the limitations of feed-forward propagation. A more powerful way to perform inference is through optimization. Instead of a net just propagating through layers to produce an output, imagine a system that extracts a representation and then has another machine with a single scalar output—an "Energy"—that measures the degree of incompatibility between the input and a proposed output.

If I put an image of an elephant and the label "elephant," I want the energy to be zero. If I put the label "table," I want the energy to be high. Inference, then, is a search: you search for an output that minimizes the energy. This is classical in AI for path planning, logic inference, and SAT problems. This allows for "zero-shot" problem-solving. It's a good model for "System 2" thinking—deliberate, slow reasoning.

LLMs, by contrast, spend a fixed amount of computation per token. To make them "think" more, you have to trick them into producing more tokens (Chain of Thought). Also, autoregressive generation is a divergent process. The set of all possible sequences is a tree. Once a token takes you outside the sub-tree of correct answers, there is no way back. The probability of a sequence being correct decreases exponentially with length. This is why LLMs hallucinate. We don't produce answers by blurting one word after another; we have an abstract thought and then turn it into text.

Joint Embedding Predictive Architecture (JEPA)

One idea is to train a generative model to predict what happens next in a video. However, predicting at the pixel level is an impossible task. If you train a neural net to make a single prediction of a video, the best it can do is predict a blurry average of all possible futures. To handle natural video, you'd need to parameterize a distribution over high-dimensional continuous space, which is mathematically intractable.

The proposal is: don't predict at the pixel level; predict at the representation level. This is the JEPA (Joint Embedding Predictive Architecture). Instead of predicting all the pixels, we predict a representation of the pixels. We run the video through an encoder and train a predictor in that representation space. This abstract representation can eliminate details that are not predictable, making the task simpler.

This is how we apprehend the world. Science is the quest for representations that allow us to make predictions while ignoring details. To predict Jupiter's trajectory, you don't need to know the details of its surface; you only need six numbers: three positions and three velocities. Everything in this room could be described by quantum field theory, but that's impossible to compute. So we invent abstractions: atoms, molecules, cells, organisms, societies. Every level of science is defined by the abstraction level we choose to make predictions.

Hierarchical Planning and Cognitive Architectures

If we have a world model, how do we use it? The agent observes the world, combines perception with memory, and feeds it to the world model. The model takes an imagined sequence of actions and predicts the resulting states. These predicted states are fed to a "task objective" and "guardrails." The robot searches for an action sequence that satisfies those objectives. It cannot escape the guardrails because, by construction, it only takes actions that minimize the cost function.

Ultimately, we need hierarchical world models. If I want to go from New York to Paris, I don't plan millisecond-by-millisecond muscle controls. I plan at a high level: go to the airport, catch a plane. Each high-level action becomes a sub-goal at a lower level (get a taxi, go to the elevator, stand up from the chair). There is a point where I don't need to plan; I just act (System 1). How we learn these appropriate levels of abstraction and plan hierarchically is completely unsolved. It is a wide-open problem for the next generation of researchers.

Self-Supervised Learning and Preventing Collapse

To train these models, we need a way to ensure the energy is low for observed data and high for unobserved data. If you only minimize the energy of training samples, the system might "collapse"—learning a flat energy surface where everything has zero energy.

There are two main ways to prevent this:

Contrastive Methods: You generate "negative" samples and push their energy up. This is hard to scale in high-dimensional spaces.
Regularized Methods: You use a term that minimizes the volume of space that can have low energy. When you push down on the training samples, the rest must go up.

I’ve become a fan of regularized methods. One example is the "Dino" model. It uses two encoders where one is a running average of the other (distillation). Somehow, this doesn't collapse, even though we don't fully understand why yet. Dino-v2 is a major success; it shows that self-supervised learning now matches or surpasses supervised learning in image understanding using less labeled data.

We can use these representations for world models. We've shown experiments where a robot uses a Dino encoder and a predictor to plan trajectories to move chips on a table or navigate to a trash can. These systems work "zero-shot" because they have a good world model.

Another recent model, V-JEPA, trains on video. It learns a representation by predicting masked parts of a video in representation space. It learns a level of "common sense" or intuitive physics. If shown a video where a ball disappears, the prediction error shoots up because the model knows that is impossible.

Summary of Recommendations

To get AI to the next level—human or even cat level—I recommend the following:

Abandon generative models in favor of Joint Embedding Predictive Architectures. Don't predict in input space; predict in representation space.
Use the energy-based framework to understand these systems. Probabilistic modeling is often unnecessary and leads to intractability.
Abandon contrastive methods in favor of regularized methods like VICReg or Dino.
Minimize the use of reinforcement learning. It is extremely inefficient. Use it only as a last resort.

These recommendations go against the most popular concepts in machine learning today. It doesn't make me very popular in some circles—I'm joking—but if you want to solve the big problems of AI, don't just work on LLMs. Work on JEPA.

Thank you very much.

🌐 youtube.com, ai, jepa, lecun, talk