Moonshot AI unveils Kimi K2 Thinking with 44.9% on HLE

Moonshot AI launches Kimi K2 Thinking today, a public “thinking” variant of K2 tuned for deep reasoning and long-horizon tool use. The model targets developers, research teams, and companies building agents that plan, browse, code, and write across many steps. It is open weights with a 256k context window and native INT4 quantization for lower latency. Reported scores include 44.9 on HLE with tools, 60.2 on BrowseComp, and 71.3 on SWE-bench Verified with tools. K2 Thinking sustains 200 to 300 sequential tool calls while holding task goals.

🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built… pic.twitter.com/lZCNBIgbV2
— Kimi.ai (@Kimi_Moonshot) November 6, 2025

Under the hood it interleaves chain-of-thought with function calls and exposes a separate reasoning stream. Recommended temperature is 1.0. Tool use follows standard function-calling schemas and can be run end to end. Chat mode on kimi.com currently uses a reduced tool set, so production chats may differ from benchmark runs. Availability starts today via open weights and the platform API.

Compared with K2-Instruct, a reflex-grade model without long thinking, K2 Thinking adds deliberate reasoning and long-horizon agency. Architecture stays MoE with 1T total parameters and 32B activated. Specs list 384 experts, 64 attention heads, MLA attention, and a 160k vocab. On math and coding the team cites 99.1 on AIME25 with Python, 83.1 on LiveCodeBench v6, and 61.1 on SWE-bench Multilingual with tools. INT4 via QAT targets a near 2x speedup without accuracy loss.

Moonshot AI frames K2 as part of an agentic roadmap. The K2 family was pretrained on 15.5T tokens and introduces the MuonClip optimizer with a QK-clip stability tweak. Post-training blends large-scale agentic trajectory synthesis and reinforcement learning. The company says full agent mode on kimi.com will roll out soon and that K2 Thinking’s open release is aimed at builders who need transparent reasoning plus stable tool orchestration at scale.

Source