Vidu has introduced its Reference-to-Video capability to the Vidu Q1 model, bringing a unique blend of two previously available features to its latest release. This update enables users to upload up to seven reference images, including characters, scenes, or props, and then generate videos that maintain style and subject consistency throughout. Unlike traditional image-to-video tools, Vidu Q1 is designed to intelligently merge reference images according to the user’s prompt, delivering detailed and visually coherent results across a variety of creative contexts.
Vidu Q1 with Reference-to-Video sample
With this approach, uploaded images serve as the creative building blocks for video production, functioning as modular components that can be freely mixed and matched. Users no longer need to rely on detailed scripts or storyboards, as the model is capable of interpreting the intent behind combined assets to create scenes that remain visually consistent, even when switching between different angles or mixing diverse subjects. This flexibility appeals to content creators, marketers, and storytellers seeking to rapidly prototype ideas or produce narrative-driven videos without the constraints of manual editing or physical production.
Vidu Q1 with Reference-to-Video tour
Vidu Q1 is now available to users, with the Reference-to-Video feature accessible on the company’s platform. The update is positioned to serve both individual creators and professional teams looking to produce content across genres, from commercial advertisements to episodic story series.
🎬 The AI video tool Vidu has recently launched an upgraded Reference-to-Video feature.
— Vidu AI (@ViduAI_official) July 8, 2025
This feature allows users to upload up to seven reference images — such as characters, scenes, or props — to generate videos. Unlike traditional image-to-video tools, Vidu intelligently… pic.twitter.com/J6K4oeAFEm
The company behind this release, Vidu, has built its reputation on providing AI-driven video tools that prioritize quality and stylistic fidelity, and this latest combination of features demonstrates a further step in modular, creator-centric design within the AI video space.