AI Signals From Tomorrow

The End of LLMs: Where AI's True Breakthroughs Are Happening

1az

Send us a text

The AI landscape is undergoing a tectonic shift. While the spotlight has been on scaling large language models to unprecedented size, cutting-edge researchers are quietly pivoting toward something far more fundamental – understanding the physical world. https://www.youtube.com/watch?v=eyrDM3A_YFc

This fascinating deep dive reveals why leading AI experts now consider LLMs merely a stepping stone rather than the ultimate destination. The real action is happening across four revolutionary frontiers: machines that genuinely comprehend our physical reality, AI systems with persistent memory, technologies that can truly reason, and frameworks that plan actions within the world they understand. 

Joint Embedding Predictive Architectures (JEPA) emerge as the compelling alternative to token-based language models. Rather than struggling with pixel-level predictions in our messy, continuous world, these architectures work with abstract representations in latent space – enabling the mental simulation capabilities essential for authentic reasoning. It's a complete rethinking of how machines learn, moving away from what one expert calls the "completely hopeless" approach of generating thousands of text sequences to solve complex problems.

The shift extends to terminology as well, with "Advanced Machine Intelligence" (AMI) replacing the perhaps misleading "Artificial General Intelligence." This reflects the recognition that even human intelligence isn't truly general but specialized. While AMI might be achievable within a decade, it won't emerge magically from scaling current approaches – it requires fundamental architectural innovations.

Current AI applications already demonstrate remarkable benefits, from reducing MRI scan times by 75% to preventing vehicle collisions. The vision described isn't one of replacement but augmentation – each of us becoming managers of super-intelligent virtual assistants.

What becomes abundantly clear is that progress demands openness. No single company or country has a monopoly on innovation, and the future of AI likely depends on distributed training across global data centers to ensure diversity and prevent control by a few giants. The question isn't whether we'll build these powerful tools, but whether we'll become effective, ethical managers of what we create.

Support the show

Speaker 1:

So forget all the hype around large language models. You know, llms being the be-all and end-all of AI, our sources today. They're actually revealing something pretty startling. For some of the top researchers, these models are becoming the last thing.

Speaker 2:

Right.

Speaker 1:

Just you know, offering marginal improvements by piling on more data, more compute.

Speaker 2:

Exactly the real action. The breakthroughs seem to be happening somewhere else entirely.

Speaker 1:

That's what we're diving into today. We've got this fascinating conversation straight from NVIDIA GTC 2025, featuring a real leading voice in AI, and our mission here is to get past that buzz, you know, to unpack what the true next frontiers really are. What's it going to take to get there?

Speaker 2:

And what does it all mean for, like how we interact with Mike Heans in the future?

Speaker 1:

Exactly. We're pulling out some surprising insights, challenging some assumptions, straight from the expert's mouth.

Speaker 2:

And what's really interesting, I think, is how the whole conversation shifts. You know, it moves away from just tweaking things those incremental improvements and starts tackling the really foundational challenges in AI things those incremental improvements and starts tackling the really foundational challenges in AI, especially how machines understand our physical world.

Speaker 1:

It's moving from what we can do maybe easily to what we actually need to do for true intelligence. Okay, so if LLMs are becoming, as you said, the last thing for some key people, where is the real action? What are these frontiers?

Speaker 2:

Well, the focus seems to be shifting pretty dramatically towards four main areas.

Speaker 1:

Four areas Okay.

Speaker 2:

First, and this might be the biggest one, is getting machines to genuinely understand the physical world around us.

Speaker 1:

Okay, the physical world makes sense.

Speaker 2:

Second, developing persistent memory for these AI systems so they don't just forget everything constantly. Memory and then enabling machines to actually reason, properly, reason and finally allowing them to plan actions within that world they understand. It takes us way beyond just shuffling words around.

Speaker 1:

And it kind of clicks why LLMs aren't great for that physical world stuff doesn't it yeah. They predict discrete tokens.

Speaker 2:

Yeah.

Speaker 1:

Right, like maybe 100,000 options, yeah, something like that. Yeah, they predict discrete tokens.

Speaker 2:

Yeah, right, like maybe 100,000 options, yeah, something like that. Yeah, finite possibilities.

Speaker 1:

But the physical world is just messy. It's high dimensional and continuous.

Speaker 2:

Exactly Trying to train an AI to predict, say, every single pixel in a video feed.

Speaker 1:

Oh, wow.

Speaker 2:

Well, first, it's basically impossible because it's unpredictable at that level.

Speaker 1:

Right.

Speaker 2:

And it's just a complete waste of resources, as the expert put it.

Speaker 1:

Right, and hasn't that approach been tried for ages?

Speaker 2:

Pretty much the source mentioned, like 20 years attempts at self-supervised learning by predicting video pixels and they've basically failed. It just highlights that fundamental mismatch, whereas humans.

Speaker 1:

We get this world model super early on, don't we?

Speaker 2:

In the first few months of life. Yeah, we just sort of absorb how things work. Like you push a bottle from the top, it tips over. Push it from the bottom, it slides. We get that intuitively. It really is much more difficult to deal with the real world than to deal with language. Language is neat, discreet. The world is continuous, noisy, complex.

Speaker 1:

Okay. So if the language-based approaches, the LLMs, aren't cutting it for the physical world, what's the alternative? What are these researchers proposing? It sounds like a totally different way for AI to learn.

Speaker 2:

It really is. It's a complete architectural rethink. This is where something called JP comes in yeah, joint, embedding Predictive Architectures. This is where something called JP comes in JP, yeah, joint, embedding Predictive Architectures. That's the proposed solution for building these crucial world models and, importantly, enabling real reasoning.

Speaker 1:

Okay, jp, not just tweaking the old models, then this is fundamental. How does it work like? Conceptually, okay, and what was the big roadblock they had to clear?

Speaker 2:

Right. So the core idea is, instead of predicting pixels, which we know doesn't work well, yeah, jpa learns abstract representations of the data, so think images, video chunks, whatever it learns these representations in a kind of hidden, latent space.

Speaker 1:

Abstract representations.

Speaker 2:

Okay. So the process goes something like this you feed in an input like a bit of video. It goes through an encoder which squishes it down into this abstract representation. Then you take a later part of the video or maybe a transformed version, run that through an encoder too, and here's the key the system predicts the next abstract representation in that latent space.

Speaker 1:

Ah, so it's predicting in the abstract space, not the messy real world pixel space?

Speaker 2:

Exactly, that's the trick and a huge hurdle they overcame apparently in the last, say, five or six years. Was this thing called a model collapse?

Speaker 1:

A model collapse.

Speaker 2:

Yeah, where the system would basically just learn to ignore the input and output. The same boring average representation, all the time, useless.

Speaker 1:

Right.

Speaker 2:

Fixing that was critical to making JP actually work.

Speaker 1:

Okay, so learning and predicting in this abstract space, yeah. How does that unlock reasoning and planning? There's other key frontiers you mentioned. Yeah, it's like you know, daniel Kahneman's system one and system two thinking.

Speaker 2:

Yeah, fast and slow thinking, right yeah.

Speaker 1:

Is JP aiming for that more deliberate system, two style, moving beyond just the automatic intuitive stuff?

Speaker 2:

That's a great way to put it. It connects directly to this idea of agency, of being able to act intelligently. What you really need for intelligence is a predictor that, ok, given the current state of the world Right and an action you imagine taking, it can predict the next state of the world. What happens if I do this?

Speaker 1:

Mental simulation.

Speaker 2:

Precisely and that ability. That's the real way that all of us do planning and reasoning. It's not, as the expert put it, by kicking tokens around.

Speaker 1:

Yeah, Unlike current LLMs trying to reason by just generating thousands of text sequences and hoping one works.

Speaker 2:

Exactly which. The source called completely hopeless for anything complex, like trying to write good software by just randomly typing code.

Speaker 1:

Okay, got it. So JPA provides that foundation for real system, two style reasoning and planning.

Speaker 2:

That's the idea.

Speaker 1:

So this shift moving beyond just language, focusing on world understanding, enabling this deeper reasoning, really changes how we think about AI's goal, doesn't it? Which I guess means we need to redefine intelligence for machines.

Speaker 2:

Indeed, and the expert we're drawing from actually prefers a different term Not AGI artificial general intelligence, but AMI advanced machine intelligence.

Speaker 1:

AMI. Why the change?

Speaker 2:

The argument is that human intelligence itself isn't really general, it's actually super specialized. So calling the goal general intelligence is maybe a misnomer.

Speaker 1:

Hmm, Interesting point Human intelligence is specialized.

Speaker 2:

And this view directly challenges that hype. You know the constant claim that AGI is just around the corner. Right.

Speaker 1:

You hear that all the time.

Speaker 2:

Well, the expert points out historically, generation after generation of AI researchers for like 70 years have claimed human-level AI was just 10 years away 70 years. And the feeling is, the current wave is also wrong.

Speaker 1:

So the idea that just scaling up LLMs bigger and bigger or generating like thousands of sequences of tokens will somehow magically lead to human-level smarts?

Speaker 2:

Yeah, or this image of a country of geniuses in a data center.

Speaker 1:

That's called nonsense, pretty blunt.

Speaker 2:

It is so for the people who do think scaling is the path, what's the blind spot? Why isn't just more data, more compute enough?

Speaker 1:

What are they?

Speaker 2:

missing. I think the blind spot is assuming intelligence just emerges from massive text data alone. It kind of ignores the fundamental difference between you know, language processing and building that robust world model from continuous messy real world sensory input.

Speaker 1:

Right the world model, yeah.

Speaker 2:

So the expert offers a more, let's say, realistic timeline getting a good handle on these small scale, abstract mental models, for reasoning, for planning, maybe within three to five years.

Speaker 1:

Three to five years for the building blocks.

Speaker 2:

And actual human level, ami, perhaps within a decade or so, which you know is not that far in the grand scheme of things.

Speaker 1:

A decade or so, still fast, but not next year.

Speaker 2:

Exactly. It emphasizes that the path isn't just scaling what we have. It's about these deeper architectural changes focusing on how AI perceives and interacts with the actual physical world.

Speaker 1:

So yeah, not quite the sci-fi movie version, but a more nuanced, step-by-step path. Still, even without this future, ami AI is already doing incredible things. Now let's talk about those tangible benefits.

Speaker 2:

Oh, absolutely the impact. Right now, especially in science and medicine, it's potentially bigger than we can currently imagine.

Speaker 1:

Like what.

Speaker 2:

Well, protein folding obviously Drug design, but also really immediate things like using deep learning for prescreening mammograms, spotting tumors earlier or reducing MRI scan times by a factor of four. Think about that, Because AI can reconstruct high-res images from much less scan data. Huge for people.

Speaker 1:

Factor of four, that's massive, yeah, and in cars too, right, definitely.

Speaker 2:

Driving assistance systems, things like automatic emergency braking, which is now mandatory in Europe, by the way. Those systems are reducing collisions by something like 40%. They are literally saving lives. These are huge real-world impacts happening today.

Speaker 1:

Yeah, it's clear. It's not just about replacing jobs, is it? It's more like AI is making people more productive and more creative.

Speaker 2:

Exactly, it's giving them power tools.

Speaker 1:

That's a great way to put it, you see, with coding assistants helping programmers.

Speaker 2:

Yep Medicine, art generation, even just writing text.

Speaker 1:

Right and the vision described is interesting A future where we each have a staff of super intelligent virtual people working for us.

Speaker 2:

And we're their boss.

Speaker 1:

We're the managers An interesting future.

Speaker 2:

But while those benefits are clear, it does bring up the question how do we make sure AI helps everyone safely, effectively, especially with the concerns about, you know, the dark side deepfakes, misinformation. Yeah that's always the fear. Well, interestingly, the expert points out that, despite LLMs and deep fake tech being available for years now, there hasn't actually been this feared big increase in generative content being posted on social networks nefariously.

Speaker 1:

Really, that's, surprising.

Speaker 2:

Or if it is posted, it's often labeled as being synthetic. The predicted flood hasn't quite materialized in the way people worried.

Speaker 1:

So why the disconnect between that reality and the public fear that Galactica versus ChatGPT story you mentioned that seems relevant here.

Speaker 2:

Absolutely. It highlights perception, doesn't it? Meta releases Galactica. An LLM trained on science papers right and it gets met with just vitriol online Huge outcry they pull it down.

Speaker 1:

I remember that.

Speaker 2:

Three weeks later, openai releases ChatGPT and it's hailed as like the second coming of the Messiah.

Speaker 1:

Huh, same basic tech, wildly different reactions.

Speaker 2:

Totally. It shows how much framing and perception matter, but the core belief here is that the best countermeasure against misuse is just better AI.

Speaker 1:

Better AI fixes bad AI.

Speaker 2:

Kind of Systems that have common sense, the capacity to reason, the ability to check their own answers, assess their reliability. That's the long-term fix. The expert was pretty clear. Catastrophic scenarios frankly, I don't believe in them. People adapt.

Speaker 1:

People adapt Okay. So the solution lies in advancing the tech responsibly, not just trying to lock down the current flawed versions.

Speaker 2:

That seems to be the argument which points towards needing broad collaboration right to build this better AI.

Speaker 1:

So where does innovation actually come from? Is it all just big labs? And what about open platforms?

Speaker 2:

Well, a key point is that good ideas can come from anywhere. No single company, no single country has a monopoly on good ideas.

Speaker 1:

Right.

Speaker 2:

Real progress comes from the interaction of a lot of people, the exchange of ideas and, crucially, the exchange of code.

Speaker 1:

Code sharing open source.

Speaker 2:

Exactly, which is why there's this strong advocacy for open source AI platforms, like Meta's philosophy with PyTorch and later with Llama.

Speaker 1:

And you see that global talent point clearly with something like ResNet right, that neural network architecture.

Speaker 2:

Oh, absolutely. Published back in 2015 by Chinese scientists at Microsoft Research in Beijing, Kaming he was the lead author. And now Now it's the most cited paper in all of science for the last decade. Just shows you talent is everywhere. Innovation is global.

Speaker 1:

And that Llama story is pretty wild, wild too. A small pirate project, basically.

Speaker 2:

Yeah, like a dozen people in Paris working somewhat under the radar initially, and it just exploded Over a billion downloads.

Speaker 1:

A billion. Wow. What does that tell you about innovation? About just letting people run with ideas?

Speaker 2:

It says a lot about giving people a long leash right, trusting teams to explore, even without full top-down blessing at first. Innovation thrives on that. Freedom and the benefits of open source, especially open weights models like Lama, are really significant.

Speaker 1:

How so.

Speaker 2:

Well for one, it jump-started the entire ecosystem of startups. Suddenly, small companies could prototype ideas, maybe using a paid API, but then actually deploy it affordably using OpenLama. That's huge for competition and innovation.

Speaker 1:

Monetizes access.

Speaker 2:

Exactly, and there's a deeper, more philosophical point too. Think about it AI assistants are going to mediate almost every single one of our interactions with the digital world.

Speaker 1:

Yeah, that seems likely.

Speaker 2:

Do we want just one or two companies controlling those gatekeepers? The argument is we need extremely diverse assistance.

Speaker 1:

Diverse? In what way?

Speaker 2:

Speaking all the world's languages, understanding all the world cultures, all the value systems, all the centers of interest, even having different biases, political opinions.

Speaker 1:

Different biases. That's provocative.

Speaker 2:

Well, the idea is a single AI source controlling information would be terrible for democracy, right Like a single press controlling all news. Diversity is resilience.

Speaker 1:

A marketplace of AI assistance, almost Sort of.

Speaker 2:

And looking ahead. This open vision extends to how models are even trained.

Speaker 1:

How so.

Speaker 2:

The proposal is that future foundation models will be open source and trained in a distributed fashion.

Speaker 1:

Distributed Like across different places.

Speaker 2:

Yeah, Various data centers around the world each having access to different data subsets, training a kind of consensus model.

Speaker 1:

Why distributed?

Speaker 2:

Because, realistically, no single company, not even the biggest ones, can possibly collect all the world's data needed for a truly universal, unbiased AI. It has to be a collaborative, distributed effort.

Speaker 1:

That ensures diversity, democratizes it, stops a few giants controlling everything.

Speaker 2:

That's the hope, spreading the benefits and the control more globally.

Speaker 1:

Okay, that vision of an open, diverse, distributed AI future. It's really compelling, but it also screams technical challenges. We're going to need some major breakthroughs right, Especially hardware fundamental concepts.

Speaker 2:

Oh, absolutely, it's not easy.

Speaker 1:

Beyond just needing faster GPUs, what are the big computational hurdles for these next-gen systems like JPA?

Speaker 2:

Yeah, even these JPA models, the ones reasoning in abstract space. They're expected to be computationally expensive at runtime. So yes, we need cheaper hardware. Continued GPU progress is essential.

Speaker 1:

Right.

Speaker 2:

But it also points back to needing that different architecture for system 2 thinking especially for the physical world stuff.

Speaker 1:

Because the physical world is harder than language.

Speaker 2:

Much more difficult according to the source, because it's continuous language is discrete. Continuous signals are much more susceptible to noise. Discrete tokens are robust. It's a fundamental difference that requires different approaches.

Speaker 1:

Which leads to that really stark comparison about data right, why text just isn't enough.

Speaker 2:

Yeah, this was mind blowing. Current big LLms trained on maybe 30 trillion tokens okay, which is a lot it's about 10 to the power of 14 bytes. To read that much text it would take one human over 400 000 years 400 000 years. Okay, now compare that to a four-year-old child. By age four they've been awake for maybe 16,000 hours. In that time, just through their vision, the optic nerve carrying about two megabytes per second, they've processed roughly the same amount of data 10 to the power of 14 bytes.

Speaker 1:

Whoa Same amount of data, but through vision, in just four years.

Speaker 2:

Exactly so. The conclusion drawn is pretty stark. We're never going to get to AGI by just training from text. It's just not happening. The bandwidth is too low, the grounding is missing.

Speaker 1:

So if text isn't enough and we need this visual, physical world, understanding what hardware is promising beyond just faster GPUs, are things like neuromorphic chips the answer? Quantum?

Speaker 2:

Well, there's some skepticism there. Actually, Neuromorphic hardware. The expert doesn't see it being practical for general AI anytime soon. Issues with hardware reuse communicating between ships efficiently, Even though the brain uses digital spikes, making it work in silicon at scale is hard.

Speaker 1:

Okay, so neuromorphic is maybe not the immediate future Quantum.

Speaker 2:

Extreme skepticism for general. Ai Probably useful for simulating quantum systems, maybe drug discovery, but not for running these kinds of world models.

Speaker 1:

And optical computing that used to be hyped.

Speaker 2:

Apparently it never panned out. The practical challenges were too great.

Speaker 1:

So what is promising then?

Speaker 2:

Things like processor and memory or PM, and related analog or mixed analog digital processor memory tech.

Speaker 1:

Processor and memory Doing the computing right where the data is stored.

Speaker 2:

Exactly. The idea is to cut down on the huge energy costs of constantly shuffling data between memory and processors. Process it right on the sensor or very close to it.

Speaker 1:

Like a human eye.

Speaker 2:

Precisely the retina analogy was used 60 million photoreceptors doing initial processing, compressing all that down to just 1 million optic nerve fibers sending info to the brain. Super efficient pre-processing that's the kind of inspiration.

Speaker 1:

So it's less about just raw teraflops and more about smart, efficient, bio-inspired architectures. Which brings up a key question Are we maybe too focused on just scaling existing methods, hoping intelligence just emerges?

Speaker 2:

That's a really important question. Are we missing the forest for the trees, focusing too much on scale, not enough on the core architectural problems?

Speaker 1:

Is that the bottleneck?

Speaker 2:

According to this source. Yeah, the main bottleneck right now isn't necessarily a lack of grand theories. It's finding a good recipe the practical engineering know-how to actually train these new JPA-style architectures effectively at scale.

Speaker 1:

Good recipe, like cooking.

Speaker 2:

Kind of Think about past breakthroughs ResNet in computer vision. The core idea skip connections was very simple but it solved the vanishing gradient problem and let networks go super deep. That was part of the recipe, or the causal transformer architecture for GPT models.

Speaker 1:

Exactly that architecture part of the recipe or the causal transformer architecture for GPT models.

Speaker 2:

Exactly that architecture was the recipe that allowed language models to scale massively. It's about finding those crucial engineering tricks which non-linearity, to use the right optimizers, the correct normalization methods.

Speaker 1:

All the practical details that make the theory actually work in the real world.

Speaker 2:

Precisely Finding that specific combination, that good recipe, is what often unlocks the next wave of progress, and that's maybe what's needed now for world models and reasoning.

Speaker 1:

Wow, ok, definitely a lot to think about there.

Speaker 2:

Yeah.

Speaker 1:

As we kind of wrap up this deep dive, what's the big picture takeaway for listeners about where AI is headed and maybe their role in it?

Speaker 2:

Well, I think we've seen a clear shift right. The really cutting-edge research is moving beyond just making LLMs bigger. It's tackling these fundamental problems understanding the physical world, reasoning, planning, using new ideas like JPA.

Speaker 1:

Right, a more grounded intelligence.

Speaker 2:

We got a more realistic view of advanced machine intelligence, ami. It's likely coming maybe a decade or so, but not from just scaling today's methods.

Speaker 1:

And we saw the huge benefits AI already provides in medicine safety.

Speaker 2:

Absolutely, and a calmer perspective on the risks, emphasizing that better AI with common sense is the real countermeasure and that people adapt. But maybe the most important thread running through it all is openness. The expert strongly believes progress towards human-level AI AMI will require contributions from everyone. It has to rely on open research and based on open source platforms.

Speaker 1:

It's not going to be one company cracking it in secret.

Speaker 2:

No, and it won't be a single event horizon moment where humanity gets killed within an hour. It'll be a process built by many.

Speaker 1:

So, as we head towards that future maybe one where we are all, indeed, managers of these super intelligent virtual assistants perhaps the final thought for everyone listening is what does that actually mean for us If AI becomes the ultimate power tool? Maybe the biggest challenge isn't just building the AI, but figuring out how we become the best possible bosses guiding these powerful tools effectively and ethically.

Speaker 2:

That's a really profound thought to end on. How do we manage the managers?

Speaker 1:

A lot to mull over. Thank you for joining us on this deep dive into the really fascinating future of AI.