xAI launches Grok Imagine API for text-to-video and editing tools

xAI’s Grok Imagine API now lets developers create text-to-video and image-to-video content with synced audio, supporting multiple aspect ratios.

· 1 min read
Grok

xAI has introduced Grok Imagine as a public API, promoting it as a unified stack for text-to-video, image-to-video, and prompt-driven video edits with synchronized audio. This tool is designed for developers creating innovative products, teams producing advertisements and social media clips, and enterprises requiring rapid iteration across multiple variants.

Requests are processed as deferred jobs: users submit a generation or edit call, receive a request_id, and then retrieve the finished asset once processing is complete, with SDK auto-polling available. For generation, creators can set clip lengths from 1 to 15 seconds, choose between 480p or 720p resolutions, and select aspect ratios including 16:9, 4:3, 1:1, 9:16, 3:4, 3:2, and 2:3. For editing, the duration remains the same as the source video, focusing on restyling, adding or removing objects, and tighter motion control.

xAI emphasizes quality, latency, and cost, supported by leaderboard and rater studies it references. It highlights a number one position in Artificial Analysis text-to-video rankings and reports human side-by-side results on IVEBench at 1280×720, where Grok Imagine is preferred overall compared to Kling o1 and Runway Aleph.

Distribution extends beyond xAI’s own platform. Companies like Fal, ComfyUI, InVideo, Flora, and HeyGen are integrating the endpoints for generation and prompt-based tweaks into their creator pipelines. The company’s API is positioned as OpenAI-compatible, with dedicated routes for video generations and video edits under the http://api.x.ai base.

Source