Mistral releases Mistral Small 4 model under Apache 2.0 licence

What's new? Mistral small 4 is an open source ai model with 119b parameters, mixture-of-experts and multimodal text-image support via several channels; it cuts latency 40 percent and triples throughput vs small 3;

· 2 min read
Mistral

Mistral has announced the immediate open-source release of Mistral Small 4, a unified AI model designed to serve as a fast instruct assistant, a deep reasoning engine, and a multimodal system in one. Targeted at developers, enterprises, and researchers, this model is available to the public under the Apache 2.0 license. Mistral Small 4 can be accessed through the Mistral API, AI Studio, Hugging Face, and NVIDIA's day-0 NIM containers, with support for major inference frameworks including vLLM and llama.cpp. Enterprises can also deploy it on-premises or fine-tune for custom needs.

This model features a Mixture-of-Experts architecture with 128 experts (4 active per token), totaling 119 billion parameters but activating only 6 billion per token for efficiency. Supporting both text and images with a 256k context window, it allows users to toggle reasoning depth for speed or complexity as required.

Mistral

Compared to Mistral Small 3, it delivers 40% lower latency and triple the throughput, while matching or surpassing the performance of models like GPT-OSS 120B and Qwen in key benchmarks. Shorter, more accurate outputs translate to lower costs and greater scalability. Industry experts and technical teams highlight the model’s impact on enterprise cost control, reliability in long tasks, and ease of integration.

Mistral, known for its rapid advances in open-source AI, has positioned Small 4 as a flagship solution, furthering its commitment to transparency and collaboration. By joining the NVIDIA Nemotron Coalition, the company signals ongoing partnership with major industry players to accelerate the adoption and optimization of large-scale, efficient AI models.

Source