MiniMax M3, a new multimodal model developed by MiniMax, is now available on NVIDIA’s accelerated infrastructure and supports advanced processing of text, images, and video. With 428 billion parameters and a context window of up to one million tokens, the model is engineered for long-context reasoning and complex workflows such as extended coding, video analysis, and design tasks.
The system’s architecture uses MiniMax Sparse Attention, reducing computational overhead and enabling substantially faster prefill and decoding than its predecessor. It trains natively on multimodal data from the outset, setting it apart from models that add these capabilities after initial training.
Congrats to the @MiniMax_AI team on the release of MiniMax M3, a long-context multimodal model for text, image, and video reasoning. 🙌
— NVIDIA AI (@NVIDIAAI) June 12, 2026
Try it today with our free GPU-accelerated endpoint on https://t.co/es07MrU5I0.
Details: https://t.co/89qlcTP3OW https://t.co/3bufMjXpp9 pic.twitter.com/iyMhbW03nQ
This release targets enterprise developers and organizations seeking to streamline AI application pipelines. MiniMax M3 can be deployed publicly via NVIDIA’s API catalog, with support for leading inference engines such as TensorRT LLM, SGLang, and vLLM. The model’s precision formats (BF16 and MXFP8) and support for up to 128 experts per token optimize performance on NVIDIA hardware, particularly Blackwell GPUs.
TestingCatalog POV 👀
MiniMax M3 on NVIDIA is a good chance for everyone to test the model for free. It is especially useful if you want to run a weekend project or save tokens for your 24/7 agents, such as OpenClaw or Hermes.
Early users and technical experts have noted the considerable efficiency gains and the ability to handle large-scale, multimodal workloads natively, putting MiniMax M3 in direct competition with other large language models in the market. The company’s collaboration with NVIDIA underscores a commitment to scalable, production-grade AI solutions for demanding enterprise environments.