MiniMax M3 launches on NVIDIA platform with Free Endpoint

What's new? MiniMax M3 is a multimodal model on NVIDIA accelerated compute with text, image, video support and sparse attention for long tasks available via NVIDIA API;

· 1 min read
MiniMax M3 on NVIDIA

MiniMax M3, a new multimodal model developed by MiniMax, is now available on NVIDIA’s accelerated infrastructure and supports advanced processing of text, images, and video. With 428 billion parameters and a context window of up to one million tokens, the model is engineered for long-context reasoning and complex workflows such as extended coding, video analysis, and design tasks.

The system’s architecture uses MiniMax Sparse Attention, reducing computational overhead and enabling substantially faster prefill and decoding than its predecessor. It trains natively on multimodal data from the outset, setting it apart from models that add these capabilities after initial training.

This release targets enterprise developers and organizations seeking to streamline AI application pipelines. MiniMax M3 can be deployed publicly via NVIDIA’s API catalog, with support for leading inference engines such as TensorRT LLM, SGLang, and vLLM. The model’s precision formats (BF16 and MXFP8) and support for up to 128 experts per token optimize performance on NVIDIA hardware, particularly Blackwell GPUs.

TestingCatalog POV 👀

MiniMax M3 on NVIDIA is a good chance for everyone to test the model for free. It is especially useful if you want to run a weekend project or save tokens for your 24/7 agents, such as OpenClaw or Hermes.

Early users and technical experts have noted the considerable efficiency gains and the ability to handle large-scale, multimodal workloads natively, putting MiniMax M3 in direct competition with other large language models in the market. The company’s collaboration with NVIDIA underscores a commitment to scalable, production-grade AI solutions for demanding enterprise environments.

Source