Z AI releases GLM-4.6V and GLM-4.6V-Flash open models

What's new? Z AI open-sourced its GLM-4.6V series for cloud and local setups; models handle 128k tokens and multimodal input via HuggingFace, ModelScope and z.ai;

· 2 min read
zai

Z AI has announced the open-source release of its GLM-4.6V series, a new generation of multimodal large language models. There are two models available: GLM-4.6V (106B), aimed at cloud and high-performance cluster environments, and GLM-4.6V-Flash (9B), designed for lightweight local deployment and low-latency applications. The models are available to the public, with weights accessible via HuggingFace and ModelScope, and can be integrated into applications using an OpenAI-compatible API. Users can interact with GLM-4.6V on the Z.ai platform or through the Zhipu Qingyan App.

The GLM-4.6V models are capable of processing 128,000 tokens in a single context window, allowing them to handle lengthy and complex documents, images, and videos. Key features include native function calling, multimodal tool use, and context-aware reasoning across text and visual data. The models support direct image, screenshot, and document input, and can output structured, image-rich content. Compared to previous iterations, these models close the loop from perception to action, handle tool invocation natively, and use a large pretraining dataset for broad world knowledge. Early user reports highlight strong performance in:

  1. Document understanding
  2. Code generation from designs
  3. Video summarization

These capabilities place GLM-4.6V among top open-source models for multimodal reasoning.

zai

Zhipu AI, the developer behind the GLM series, is recognized for advancing open-source large language models in China. With GLM-4.6V, the company targets enterprise, research, and development communities seeking advanced multimodal AI solutions that rival major global competitors in both capability and scale.

Source