Paper

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Damien Scieur

2026.03.26

·Arxiv·by 이호민/AI

#Computer Vision#Deep Learning#JEPA#Representation Learning#World Model

Key Points

1LeWorldModel (LeWM) introduces a stable, end-to-end Joint Embedding Predictive Architecture (JEPA) that learns world models from raw pixels using only two loss terms: next-embedding prediction and a Gaussian-distributed latent embedding regularizer.
2This approach simplifies training by reducing tunable hyperparameters from six to one, enabling efficient learning of compact models on a single GPU and significantly faster planning (up to 48x) compared to foundation-model-based alternatives.
3LeWM demonstrates strong competitive performance across diverse 2D and 3D control tasks, while its latent space encodes meaningful physical structure, as confirmed by probing physical quantities and detecting unphysical events.

o_{1:T}

Paper

Damien Scieur

2026.03.26

·Arxiv·by 이호민/AI

#Computer Vision#Deep Learning#JEPA#Representation Learning#World Model

1LeWorldModel (LeWM) introduces a stable, end-to-end Joint Embedding Predictive Architecture (JEPA) that learns world models from raw pixels using only two loss terms: next-embedding prediction and a Gaussian-distributed latent embedding regularizer.
2This approach simplifies training by reducing tunable hyperparameters from six to one, enabling efficient learning of compact models on a single GPU and significantly faster planning (up to 48x) compared to foundation-model-based alternatives.
3LeWM demonstrates strong competitive performance across diverse 2D and 3D control tasks, while its latent space encodes meaningful physical structure, as confirmed by probing physical quantities and detecting unphysical events.

o_{1:T}