DeepSeek has released the preview version of its long-anticipated V4 series, pushing its open-source lineup into million-token territory with two Mixture-of-Experts variants. The Hangzhou-based lab announced the drop on April 24, confirming months of speculation after earlier target windows in February and March slipped. V4-Pro ships with 1.6 trillion total parameters and 49 billion active per token, while V4-Flash runs on 284 billion total and 13 billion active, both defaulting to a 1M context window as standard rather than an optional tier.
The structural headline is a new attention scheme pairing token-level compression with DeepSeek Sparse Attention, which the team credits for cutting long-context compute and memory costs sharply enough to make million-token inputs the baseline across DeepSeek services. Both variants expose Thinking and Non-Thinking modes through the API, with a reasoning_effort parameter letting developers dial how hard the model deliberates per task.
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
— DeepSeek (@deepseek_ai) April 24, 2026
🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params.… pic.twitter.com/n1AgwMIymu
Benchmarks released alongside the models put V4-Pro neck-and-neck with Claude Opus 4.6, GPT-5.4 xHigh, and Gemini 3.1 Pro across knowledge, reasoning, and agentic tasks. It posts a Codeforces rating of 3206, claims open-source state-of-the-art on agentic coding, and trails only Gemini on world knowledge among frontier models. Flash sits close behind Pro on reasoning and matches it on simpler agent workflows at a fraction of the inference cost.
DeepSeek has tuned V4 to slot into established agent stacks including Claude Code, OpenClaw, and OpenCode, and the API accepts both OpenAI Chat Completions and Anthropic-format calls, teams only need to swap the model name. The older deepseek-chat and deepseek-reasoner routes now alias to V4-Flash and retire on July 24.
Weights are live on Hugging Face under an open license, continuing the playbook that turned V3 and R1 into reference points for the open-source camp. With domestic pressure from Alibaba, Xiaomi's MiMo, and Moonshot's Kimi tightening, V4 positions DeepSeek as the lab defining the ceiling for open-weight frontier models rather than chasing closed-source incumbents, and the Huawei Ascend optimization path underscores the parallel push toward a China-native compute stack.