Bots in the real world: World AI models could deliver the humanoid robots seen in sci-fi movies

2 hours ago 1

ARTICLE AD BOX

logo

If ChatGPT not only grasped language but also had the ability to engage meaningfully with physical space, the result might be something like the android in Isaac Asimov’s 'I, Robot.'(Bloomberg)

Summary

While the world fixates on chatbots, an AI race is underway to equip machines with spatial and global awareness. Nvidia, Tencent and others have developed ‘world models’ that can grasp the trajectory of a ball hit by a bat, for example. Who’ll win this market?

With everyone’s attention fixed on powerful chatbots like ChatGPT and Claude, it’s been easy to overlook the growth of another field of artificial intelligence: world models.

These systems can grasp three-dimensional space and physics, providing the foundation for everything from robots to smart glasses to self-driving cars—and a capability that today's chatbots lack.

In the past two weeks, Nvidia, Alibaba and Tencent each released their own world models, signalling that a new cast of characters could pioneer the next AI revolution.

The companies at the forefront are chasing different commercial strategies—Tencent’s HY-World 2.0 is open source while Nvidia’s model is for researchers only—and China is proving itself to be much less of a laggard than it was with large language models.

Bots like ChatGPT might seem to grasp the workings of the physical world, but in reality they’re clever mimics that have no grounding in material experience or object permanence, the understanding that humans develop as babies that a cup or a chair continues to exist even when it can’t be seen.

A language model can describe a room in elegant prose, but if you ask it whether a sofa will fit through a doorway or where a rolling ball will end up after bouncing off a wall, it’ll work from patterns in the text it’s been trained on rather than any actual grasp of the forces involved, and may get the answer wrong. World models aim to fill that gap.

The effort is quietly gaining momentum, with an array of approaches and business models that capitalize on real-world data—including a viral game that debuted a decade ago.

Remember Pokemon Go, the smartphone app that had millions of people pointing their phones at local cemeteries and street corners to catch Pokemon characters? It has since developed a global trove of mapping data it’s sharing with firms like Coco Robotics, whose delivery robots drive groceries around several cities in the US and Europe. The game’s creator, Niantic Spatial, is building what it calls a Large Geospatial Model (LGM) whose end users will essentially be robots.

DoorDash, meanwhile, is paying its gig workers to film themselves folding laundry or washing dishes to amass data it can sell to robotics firms for training. And Instacart has developed a shopping trolley in collaboration with Nvidia that’s kitted out with sensors and cameras, the goal being not to teach robots, but to collect data for advertising and inventory management.

Some scientists argue this approach to AI is a critical next step to imbue machines with something closer to human intelligence, an objective that OpenAI, Anthropic and Google have been chasing for years. (Google DeepMind, which makes the chatbot Gemini, is also betting on world models with its Genie 3.)

Imagine, for instance, if ChatGPT not only grasped language but also had the ability to drive a car or pour a cup of coffee. The result might be something like the android in Isaac Asimov’s I, Robot or more abstractly, a video game that constantly evolves with the user or an industrial automation system for gas turbines.

The companies sketching out that future also include World Labs, a startup spun out of Stanford University that was founded by Fei-Fei Li, dubbed the Godmother of AI thanks to her pioneering work on vision recognition systems. In February, Li’s company announced it had raised $1 billion in an early funding round.

World Labs, headquartered in San Francisco, uses its model—known as Marble—to generate its own virtual worlds and is looking to eventually pick up customers in gaming, virtual reality and robotics training.

But even after raising money from backers including Nvidia, AMD and Autodesk, the path to profit isn't obvious. "Wall Street, especially later-stage investors, are still waiting to see the technology mature into the use cases," Li told me in a recent interview. Still, she is undeterred: "I have total conviction this is just as profound as language intelligence.”

Li is also betting that synthetic data will be “critical” for world models, since unlike language, rich 3D material doesn’t exist in abundance online. In other words, the next wave of AI may be trained largely on footage generated by other AI, and not just videos of DoorDashers folding clothes. That may offer a potential business model in itself.

Today’s era of language models is on course to be won by a small group of deep-pocketed American labs with closed, proprietary models. But world models seem to be shaping up differently, over a wider field of approaches and regions, with more open licensing and no consensus yet on how anyone makes money.

China may well play a bigger role here. Its strength in hardware and manufacturing helped it ship roughly 85% to 90% of the world’s humanoid robots last year, according to researchers at Barclays.

If Chinese world models start to become the default for training robots, the companies that shape the next decade of physical AI won’t be the ones we’re seeing in headlines today, and they may be far from Silicon Valley too. ©Bloomberg

The author is a Bloomberg Opinion columnist covering technology.

Read Entire Article