OpenAI prepares major ChatGPT voice upgrade with GPT-Bidi-1

OpenAI looks set to give ChatGPT's voice mode its biggest upgrade in months, with preparations underway for a next-generation audio model tentatively tagged GPT-Bidi-1. The name points to the bidirectional, or "BiDi," architecture the company has been building since early this year, a model designed to listen and speak at once, absorb interruptions, and adjust mid-sentence rather than freezing the moment a user says "mm-hm." Signs of it now span web and mobile, suggesting a consumer rollout is near, though the name may shift before launch.

New OpenAI voice model "GPT-Bidi-1"

Coming soon with a "major leap in intelligence"

- The next generation of Voice
- More natural conversations, powered by our next-generation voice model https://t.co/mvH9TSisgO pic.twitter.com/Ka3Mk2LpXV
— M1 (@M1Astra) June 16, 2026

The wider point is less about voice quality than a gap OpenAI has let widen. Its text models raced ahead to the GPT-5.5 generation while voice stayed on an older audio stack, leaving spoken conversations a step behind what the same assistant manages in writing. Closing that gap matters for a company betting that speech, not text, becomes the main way people reach AI, the wager behind its planned audio-first hardware and its voice-based support tools. GPT-Bidi-1 is built around that, promising smoother exchanges plus what is billed as a major jump in reasoning.

🚨 OpenAI is planning to release GPT-Bidi-1 very soon

Their next-generation voice model for more natural conversations

[Final naming of the model might change]

h/t to @M1Astra from DevMode pic.twitter.com/brmD8bUgqb
— Chetaslua (@chetaslua) June 16, 2026

The feature's shape is coming into focus. ChatGPT users would likely keep today's setup, toggling between a new Bidi (Latest) mode and the current Advanced Voice Mode rather than being moved over wholesale. More telling is the choice of intelligence levels: High, Medium, and Instant, mirroring the tiers already offered on the text side and letting people trade speed for depth by task. A recent change that lets the voice bubble be dragged to the middle of the screen reads as an early piece of the same redesign.

Caution is warranted on timing. Whether that starts this week or later is unclear, but the groundwork is plainly being laid.