SCALE SYMBOLIC LAYERS — Quicklets.ai Tag

“I think it's fair to say that, you know, vision understanding sort of stalled out; you got to object recognition and then progress just wasn't being made. There's really an interesting research question as to why that is, and at heart, the ideas behind Moonlake are an attempt to answer that, believing that there can be a really rich connection between a more symbolic layer of abstracted understanding of visual domains, which aren't in the mainstream vision models, which are still trying to operate on the surface level of pixels.”
— Chris Manning

Startups & Tech

APR 2, 2026Latent.Space

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

BUILD WORLD MODELS SCALE SYMBOLIC LAYERS PRIORITIZE SPATIAL INTELLIGENCE USE SYNTHETIC DATA

from: Latent Space: The AI Engineer Podcast

•
Action-conditioned models are necessary for spatial intelligence
“The reality is that although the visuals do look fantastic, those visuals actually aren't accompanied by an understanding of the 3D world, understanding how objects can move, what the consequences of different actions are, and that's what's really needed for spatial intelligence. So, I mean, a term we sometimes use is that you need action conditioned world models, that you only actually have a world model if you can predict, given some action is taken, what is going to change in the world because of it.”
— Chris Manning
•
Prioritize structural abstraction over raw pixel scaling
“I think it's fair to say that, you know, vision understanding sort of stalled out; you got to object recognition and then progress just wasn't being made. There's really an interesting research question as to why that is, and at heart, the ideas behind Moonlake are an attempt to answer that, believing that there can be a really rich connection between a more symbolic layer of abstracted understanding of visual domains, which aren't in the mainstream vision models, which are still trying to operate on the surface level of pixels.”
— Chris Manning
•
Synthetic data matches real-world utility for multimodal training
“When I was actually working with Nvidia on the Synthetic Data Foundation Model Training Project, we were actually generating a lot of these synthetic data and showing that these synthetic data are actually as useful as real-world data when it comes to multimodal pre-training. But then, there's a lot of dollars being paid out to external vendors or other folks to manually curate these types of data.”
— Fan-yun Sun
•
Models should mimic human task-directed semantic abstractions
“All of the evidence from neuroscience and psychology is that most of what comes into people's eyes is never processed. You're doing fairly fine-grained processing of exactly what you're focusing on, but as soon as it's away from that, you've sort of only processing top-down this very abstracted semantic description of the world around you. Human beings are working with semantic abstractions.”
— Chris Manning
•
True world models require long-horizon consequence prediction
“If you're simply, you know, trying to predict the next video frame, that's not so difficult. But what you actually want to do is understand the consequences, likely consequences of actions minutes into the future. And to do that, you actually need much more of an abstracted semantic model of the world.”
— Chris Manning

Daily Signal - Crypto Edition

APR 2, 2026Latent.Space

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

BUILD WORLD MODELS SCALE SYMBOLIC LAYERS PRIORITIZE SPATIAL INTELLIGENCE USE SYNTHETIC DATA

from: Latent Space: The AI Engineer Podcast

•
Action-conditioned models are necessary for spatial intelligence
“The reality is that although the visuals do look fantastic, those visuals actually aren't accompanied by an understanding of the 3D world, understanding how objects can move, what the consequences of different actions are, and that's what's really needed for spatial intelligence. So, I mean, a term we sometimes use is that you need action conditioned world models, that you only actually have a world model if you can predict, given some action is taken, what is going to change in the world because of it.”
— Chris Manning
•
Prioritize structural abstraction over raw pixel scaling
“I think it's fair to say that, you know, vision understanding sort of stalled out; you got to object recognition and then progress just wasn't being made. There's really an interesting research question as to why that is, and at heart, the ideas behind Moonlake are an attempt to answer that, believing that there can be a really rich connection between a more symbolic layer of abstracted understanding of visual domains, which aren't in the mainstream vision models, which are still trying to operate on the surface level of pixels.”
— Chris Manning
•
Synthetic data matches real-world utility for multimodal training
“When I was actually working with Nvidia on the Synthetic Data Foundation Model Training Project, we were actually generating a lot of these synthetic data and showing that these synthetic data are actually as useful as real-world data when it comes to multimodal pre-training. But then, there's a lot of dollars being paid out to external vendors or other folks to manually curate these types of data.”
— Fan-yun Sun
•
Models should mimic human task-directed semantic abstractions
“All of the evidence from neuroscience and psychology is that most of what comes into people's eyes is never processed. You're doing fairly fine-grained processing of exactly what you're focusing on, but as soon as it's away from that, you've sort of only processing top-down this very abstracted semantic description of the world around you. Human beings are working with semantic abstractions.”
— Chris Manning
•
True world models require long-horizon consequence prediction
“If you're simply, you know, trying to predict the next video frame, that's not so difficult. But what you actually want to do is understand the consequences, likely consequences of actions minutes into the future. And to do that, you actually need much more of an abstracted semantic model of the world.”
— Chris Manning

Stay in the Loop

Free summaries of top podcasts. More signal, less noise.

Related Tags

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

Stay in the Loop