Wan 2.7 Has a Thinking Mode — and Closed Weights

Tellers Team · April 15, 2026

Alibaba’s Wan series has a history worth noting. Wan 2.1 — the predecessor — was fully open source. The weights were public, the model was deployable, and it ranked among the strongest open video generation models available. That was a meaningful position to hold.

Wan 2.7 is different. The weights are not public. Access is API-only. And that shift, combined with a new architectural feature called Thinking Mode, is worth examining clearly.

Released by Alibaba’s Tongyi Lab on April 6, 2026, Wan 2.7 introduces a reasoning step before video generation — the model plans composition before committing to output. It reports improvements in character consistency, color precision, and narrative coherence. Based on our own evaluation, those claims are partially supported, but the model’s actual position in the competitive landscape is less impressive than the announcement implies.

What Thinking Mode Actually Does

In standard video generation, a model maps a text description to a latent representation and decodes it into frames. There is no planning step — the model generates, it does not deliberate.

Thinking Mode adds an explicit reasoning stage before generation. The model builds a compositional plan — how to interpret the prompt’s intent, how elements should relate in space and time, what the narrative logic of the sequence should be. Only then does generation begin.

The parallel to language models is direct. Chain-of-thought reasoning improved language model accuracy on structured tasks by separating understanding from execution. Wan 2.7 applies the same logic to video.

Wan is not alone in this direction. The trend of explicit planning in generative models is real; Wan 2.7 is making it a named, front-facing feature in a commercially available video API.

Concrete improvements Alibaba reports from this approach:

Character consistency: Individual character appearance holds throughout a clip, addressing the typical “AI same-face” problem
Precise color control: Supports HEX codes and color palettes for brand-accurate content
Long text rendering: Handles 3,000+ tokens across 12 languages, including tables and formulas embedded in video frames
Narrative coherence: Multi-shot prompts produce clips where the scene logic holds

The Four-Model Suite

Wan 2.7 ships as four distinct models:

Text-to-video — 720p or 1080p output at 2–15 seconds per generation, with optional audio and multi-shot narrative control
Image-to-video — animate from a reference image with motion guidance
Reference-to-video — generate from a reference subject for consistent character appearance
Video editing — apply modifications to existing footage

All four are available via API at $0.10 per second of generated 720p video.

Our Evaluation

We ran Wan 2.7 against the models we support on Tellers.

It does not outperform them. Generation is significantly slower — approximately 4 minutes for a 5-second clip — making it impractical for workflows that require iteration. At $0.10 per second of 720p output, it is competitively priced but not the cheapest option available.

Output quality, while solid, does not exceed what we see from faster models in our stack. The reasoning step produces measurable improvements in coherence over prior Wan versions, but the gap between Wan 2.7 and the alternatives we support is not in Wan 2.7’s favor.

The one area where Wan 2.7’s position is genuinely interesting: the video editing model. For video edit tasks specifically, the price/quality ratio is worth tracking. We are evaluating whether to integrate it as a video editing option — not as a primary generation model, but to broaden coverage for that specific use case.

The text-to-video model is not currently available on Tellers.

Is Wan 2.7 Open Source?

No. The weights have not been released.

This is a direct reversal from Wan 2.1, which was open source and deployable without API dependency. Wan 2.7 is API-only. That changes the calculus for anyone who built on or evaluated the Wan series based on its open-source accessibility.

Alibaba’s decision to close the weights suggests either a shift in commercial positioning, a response to competitive dynamics, or a quality tier they intend to keep proprietary. Whatever the reason, treating Wan 2.7 as a continuation of the open-source Wan lineage would be a mistake.

FAQ

What is Wan 2.7’s Thinking Mode? Thinking Mode is a generation approach where the model reasons about the prompt’s intent and plans the composition before generating video, rather than decoding the prompt directly to output frames.

Is Wan 2.7 open source? No. Unlike Wan 2.1, the weights for Wan 2.7 have not been publicly released. Access is API-only.

What does Wan 2.7 output? 720p or 1080p video at 2–15 seconds per generation, with optional audio. The suite also includes image-to-video, reference-to-video, and video editing models.

How much does Wan 2.7 cost? $0.10 per second of generated 720p video for serverless inference.

How fast is Wan 2.7? Approximately 4 minutes for a 5-second clip — significantly slower than alternatives currently available on Tellers.

Is Wan 2.7 available on Tellers? The text-to-video model is not currently available on Tellers. We are evaluating the video editing model for potential integration as a video edit option.

Who built Wan 2.7? Alibaba’s Tongyi Lab. The text-to-video component became available on April 3, with the full suite announced on April 6, 2026.