Claude Neptune v3 shows major math gains in red team testing

What we know so far: Neptune V3 access is provided to red teams through "free model alias matching the configuration and classifiers currently used for Claude Opus 4". The model excels at math at a level of o3 and "kingfall".

Alexey Shabanov

6 Jul 2025 · 1 min read

Early reports from red team testers working with Claude Neptune v3 suggest the model is now capable of handling some of the most challenging math problems, cases that, until recently, were reliably solved only by models such as o3 or Kingfall. Notably, testers say Neptune v3 achieves these results with consistency, a quality that has been elusive in earlier Claude releases. The leaked guidance given to red teamers also reveals that access is provided through a "free model alias matching the configuration and classifiers currently used for Claude Opus 4", which raises questions about whether this is a fundamentally new model or an upgrade hidden under an existing label.

Math problem prompt

Arrange the six numbers 2, 0, 1, 9, 20, and 19 in any order to form an 8-digit number (the first digit cannot be 0). How many different 8-digit numbers can be formed?

For developers, researchers, and anyone depending on advanced mathematical reasoning in language models, these capabilities could indicate a significant step forward. The math problem-solving improvements would surface primarily within Claude’s API and developer-facing platforms, potentially extending to end-users if integrated into public-facing Claude versions. While there’s no direct evidence yet of a new model launch, the ongoing red team phase often precedes broader rollouts in Anthropic’s update cycle.

Anthropic’s product strategy has revolved around regular incremental improvements, but the competitive landscape is heating up with anticipated releases from OpenAI, xAI, and Google. With major updates expected from competitors throughout July, attention is on Anthropic to see whether these internal math advancements translate to a public announcement or release, especially as Opus 4’s positioning is increasingly challenged at the SOTA frontier. For now, the nature of the red team access and the possible overlap with current Opus 4 configurations remain a closely watched development within the AI research community.

Source