Anthropic reports distillation from DeepSeek, Moonshot, MiniMax

What's new? Anthropic detected campaigns by DeepSeek, Moonshot, and MiniMax distilling Claude AI abilities; Anthropic updated its detection systems and tightened API controls;

· 1 min read
Anthropic

Anthropic has announced the discovery of widespread efforts by DeepSeek, Moonshot, and MiniMax to extract capabilities from its Claude AI model using large-scale, coordinated campaigns. These organizations, operating through over 24,000 fraudulent accounts and generating more than 16 million exchanges, targeted Claude to capture advanced reasoning, coding, and agentic features. The campaigns employed the distillation technique, in which a weaker model is trained on the outputs of a stronger one, effectively transferring unique model strengths without the original safeguards. This activity was traced through IP data, account metadata, and patterns in API usage.

The targeted capabilities included agentic reasoning, coding proficiency, and the creation of censorship-safe outputs. Distillation attacks like these threaten to bypass regional restrictions and export controls, exposing advanced AI to unauthorized actors and reducing the effectiveness of regulatory protections. Anthropic’s findings show that these attacks are not only sophisticated but also rapidly adapt to new model releases, as seen when MiniMax shifted its focus within 24 hours of a Claude update.

Anthropic has responded by enhancing detection systems, sharing intelligence with industry partners, and tightening access controls on its API. The company urges collective industry and policy action, warning that models built from illicit distillation may lack critical safeguards and could be misused for offensive cyber operations or disinformation campaigns. Anthropic, a leading US AI developer, has positioned itself as a proponent of export controls and responsible AI deployment, emphasizing the broader security implications of these unauthorized distillation efforts.

Source