With everyone’s attention fixed on powerful chatbots like ChatGPT and Claude, it’s been easy to overlook the growth of another field of artificial intelligence: world models. These systems can grasp three-dimensional space and physics, providing the foundation for everything from robots to smart glasses to self-driving cars — and a capability that today’s chatbots lack. In the past two weeks, Nvidia Corp., Alibaba Group and Tencent Holdings Ltd. each released their own world models, signaling that a new cast of characters could pioneer the next AI revolution. The companies at the forefront are chasing different commercial strategies — Tencent’s HY-World 2.0 is open source while Nvidia’s model is for researchers only — and China is proving itself to be much less of a laggard than it was with large language models.
Bots like ChatGPT might seem to grasp the workings of the physical world, but in reality they’re clever mimics that have no grounding in material experience or object permanence, the understanding that humans develop as babies that a cup or a chair continues to exist even when it can’t be seen. A language model can describe a room in elegant prose, but if you ask it whether a sofa will fit through a doorway or where a rolling ball will end up after bouncing off a wall, it’ll work from patterns in the text it’s been trained on rather than any actual grasp of the forces involved, and may get the answer wrong. World models aim to fill that gap.