Alibaba released Happy Oyster this week — a model that generates interactive 3D environments from text, voice, images, or keyboard input. It is built by the same team that produced HappyHorse-1.0, which topped the Artificial Analysis video leaderboard earlier this month.
But Happy Oyster is a fundamentally different kind of system. It is not a video generator. It is a world model.
That distinction matters, and it is worth understanding clearly.
What Is a World Model?
A standard AI video generator takes a prompt and renders a finished clip. You describe a scene, the model outputs video, and you work with what it produced. Generation is a one-shot process: the model runs, the clip exists, and there is no going back in.
A world model works differently. Instead of producing a finished artifact, it generates a persistent or continuously evolving environment that responds to ongoing user input. You can move through it, steer it, change direction, and interact with objects in real time.
The clearest analogy is the difference between watching a film and playing a game. A video generator makes films. A world model builds the space the film lives in.
Happy Oyster is currently in limited early access. There is no public API and no broad pricing. But the capabilities Alibaba demonstrated are concrete.
What Happy Oyster Does
Happy Oyster has two modes:
Directing — you steer a scene in real time for up to three minutes at 480p or 720p resolution. You control camera angle, direction, and story elements through text or voice commands. The model maintains persistent object placement, coherent lighting, and stable motion as you move.
Wandering — you navigate through a generated world for up to one minute at 480p using WASD-style keyboard controls. The environment extends as you move through it, responding to what you explore rather than what you scripted in advance.
Both modes are notable because the environment responds to you during generation, not after. That is the defining characteristic of a world model: generation and interaction happen simultaneously.
This Is Not Just a China Story
China is moving fast in this category, with Alibaba’s Happy Oyster and Tencent’s HY-World pushing interactive environment generation forward.
But some of the most important world-model work outside China is also happening in the US.
Google DeepMind’s Genie 3 is one of the clearest reference points. DeepMind describes it as a general-purpose world model that can generate interactive environments from text, lets users navigate them in real time at 24 fps, and keeps them consistent for a few minutes at 720p. In January 2026, Google also began rolling out Project Genie, a prototype that lets Google AI Ultra users in the US create, explore, and remix interactive worlds built on Genie 3.
Another major US player is World Labs. Its product, Marble, is publicly available and focused on creating persistent 3D worlds from text, images, video, or coarse 3D layouts. It also supports interactive editing and export to formats like Gaussian splats, meshes, or video.
Runway is also now explicitly in the race with GWM-1, which it describes as a general world model built to simulate reality in real time. That matters because it shows the category is expanding beyond research demos into products tied directly to creative tooling and media workflows.
Europe is earlier here on public productization, but it is not absent. The most visible entrant is AMI Labs, Yann LeCun’s Paris-based company, which says it is building world models for robotics, manufacturing, wearables, and a model called AMI Video. That is better understood today as a major strategic bet on world models from Europe, rather than a publicly accessible creator tool comparable to Marble or Project Genie.
Happy Oyster vs Marble: Same Category, Different Emphasis
This is where the distinction gets important.
At first glance, Alibaba’s Happy Oyster and World Labs’ Marble can look similar because both move beyond one-shot video generation and into explorable 3D environments. But they are pushing on different parts of the stack.
Happy Oyster is best understood as an interactive world simulator. The value is in live response: you move, speak, steer, and the world keeps generating around you. The core experience is temporal and action-driven.
Marble is closer to a 3D world creation and editing system. Its emphasis is on building a persistent 3D scene from multimodal inputs, then editing, expanding, combining, and exporting that world into usable 3D or video assets.
Google DeepMind has actually drawn a useful line here in its own Genie materials: it contrasts Genie 3 with “explorable experiences in static 3D snapshots,” arguing that Genie’s key difference is that it generates the path ahead in real time as you move and interact. That framing helps explain the split. Marble is closer to world construction and spatial authoring. Happy Oyster and Genie 3 are closer to live world simulation.
That does not make one approach better than the other. It means they are optimized for different workflows.
- If you want to author, edit, and export a 3D environment, Marble is the more obvious comparison.
- If you want to navigate and continuously generate an environment as it reacts to your actions, Happy Oyster and Genie 3 are the better fit.
- If you want to train agents or simulate embodied interaction, the simulation-first approach is especially important.
The Competitive Context
Happy Oyster competes directly with Tencent’s Humayun HY-World 2.0, which is also targeting AI-generated explorable environments rather than finished clips.
But globally, the competitive set is now much broader:
- Alibaba Happy Oyster — interactive world generation for directing and wandering
- Tencent HY-World 2.0 — explorable AI-generated worlds
- Google DeepMind Genie 3 / Project Genie — real-time simulated environments and interactive world prototyping
- World Labs Marble — multimodal 3D world creation, editing, and export
- Runway GWM-1 — general world model tied to media and simulation ambitions
- AMI Labs — emerging European bet on world models, especially for physical and industrial use cases
That is the bigger signal: world models are no longer a niche side experiment. They are becoming a serious frontier across research labs, creator platforms, and embodied AI companies.
Why This Matters for AI Video Creators
World models do not replace video generators. They address a different point in the production chain.
The most immediate application is pre-visualization. A director or video creator can walk through a generated environment, explore sight lines, test lighting scenarios, and define shots before committing to a final render. That is currently a task that usually requires either 3D software or a lot of imagination applied to a script.
Longer term, world models could become the source material for video generation itself. Instead of describing a scene only in text, you navigate to a position in a generated world, frame the exact shot you want, and use that as the basis for a final cinematic render.
That opens up an interesting split in future workflows:
- world model for exploration, blocking, and camera discovery
- video model for final rendering and polish
That pipeline feels increasingly plausible now that the field includes both simulation-heavy systems like Genie 3 and Happy Oyster, and authoring-oriented systems like Marble.
It is still early. Happy Oyster is in limited access with no public API. Genie 3 remains a limited research preview, though Project Genie is now rolling out to Google AI Ultra users in the US. Marble is public, but it is still an early category product. The direction is clear, even if the production workflows are not mature yet.
What You Can Do Today
Happy Oyster is not yet available on Tellers, and given its limited early-access state, it is not ready for production video workflows.
What is available on Tellers today is a stack of production-grade video generation models — including Runway Gen 4.5, LTX Video with first/last frame control, and Kling — alongside the agent layer that orchestrates them. You can generate footage, direct camera motion in natural language, and edit the results in the same timeline.
As world models mature, the most interesting opportunity for tools like Tellers is obvious: use them upstream for exploration, previs, and shot planning, then hand off the selected compositions to production-grade render and editing workflows.
Try it at app.tellers.ai or read more about the Tellers platform.
FAQ
What is a world model in AI?
A world model generates a persistent or continuously evolving interactive environment rather than a finished video clip. Users can navigate and steer it while it is being generated.
What is Alibaba’s Happy Oyster?
Happy Oyster is an AI world model released by Alibaba in April 2026. It generates interactive 3D environments that users can explore and direct through text, voice, image, or keyboard input.
What is the main difference between Happy Oyster and World Labs Marble?
Happy Oyster is closer to a live interactive simulator. Marble is closer to a multimodal 3D world authoring system that emphasizes creation, editing, expansion, and export of persistent 3D worlds.
Is Google building a world model too?
Yes. Google DeepMind’s Genie 3 is a general-purpose world model for generating interactive environments in real time, and Project Genie is the prototype product built on top of it.
Are there important world-model efforts outside China and the US?
Yes, but Europe is earlier in product rollout. The clearest current example is AMI Labs, a Paris-based company building world models for physical-world applications.
What does this mean for AI video creation?
World models introduce a new layer in production: interactive pre-visualization, exploration, and shot planning before final video generation. They do not replace video models, but they could become a major upstream interface for directing them.