New Claude Neptune model undergoes internal red team review

Anthropic appears to be nearing the release of a new AI model codenamed Claude Neptune, currently undergoing internal safety testing on the Anthropic Workbench. Red team exercises are set to run until May 18, with the primary focus on probing the model’s robustness against jailbreaking attempts, especially targeting the constitutional classifiers system that underpins Anthropic’s safety protocols. This suggests that Neptune is likely a more sensitive or capable system requiring rigorous pre-release scrutiny. The existence of a dedicated red team cycle and constitutional bypass testing signals a potential leap beyond incremental improvements.

The upcoming model could benefit developers, researchers, and enterprise users who rely on Claude for secure, high-performance reasoning, particularly in domains like code generation and technical research, where Claude has remained strong in evaluations. However, there’s still uncertainty about whether Neptune represents a new tier in the Claude family or just a refined version of Claude Sonnet or Opus.

Must be. Neptune 8th planet from sun
— Super Dario (@inductionheads) May 13, 2025

From the company’s broader perspective, this comes at a moment when Anthropic is perceived to be trailing OpenAI and Google in feature velocity and public exposure. Yet it continues to stand out on safety-first design and coding benchmarks, maintaining a reputation for stability and alignment. A well-timed release in late May or early June would allow Claude Neptune to compete against upcoming launches like OpenAI’s rumoured GPT-5 and Google’s Gemini Ultra, both expected to consolidate multimodal and agentic capabilities.

On Wednesday, May 14, Anthropic also shared more details on this matter. It is evident that the core aim here is to test Neptune's safety system, but it is yet unclear which model is being used underneath.

We're launching a new bug bounty initiative to stress-test an updated version of our anti-jailbreaking system before it’s publicly deployed.

The program, in partnership with @Hacker0x01, runs through Sunday.
— Anthropic (@AnthropicAI) May 14, 2025

The new Claude model was identified through workbench logs indicating internal testing workflows. While not public, the appearance of Neptune in this phase usually precedes a release within weeks. With Anthropic’s strategic focus on commercial safety and research adoption, Neptune may also introduce architecture-level changes or policy adjustments in how models apply constitutional AI principles in practice.