Hume AI has launched Octave 2, its next-generation voice AI model for text-to-speech, publicly previewed on the company's platform and API. Octave 2 delivers faster response times under 200 milliseconds and spans 11 languages, including Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Portuguese, Russian, and Spanish. The rollout targets developers, enterprises, and creators seeking state-of-the-art speech synthesis. Compared to its predecessor, Octave 2 is 40% faster and more cost-effective, now at half the price of Octave 1, and is optimized for advanced inference hardware through partnership with Sambanova.
Octave 2 introduces voice conversion and direct phoneme editing capabilities, letting users swap voices or fine-tune pronunciation, timing, and emphasis. These features enable applications in dubbing, entertainment, and nuanced voiceovers, with advanced support for uncommon words, repetition, numbers, and symbols. Although the core features are available now, voice conversion and phoneme editing are set to arrive shortly. EVI 4 mini, also released, leverages Octave 2 for speech-to-speech tasks but requires pairing with an external LLM for full language generation.
Introducing Octave 2: our next-generation multilingual text-to-speech model
— Hume AI (@hume_ai) October 1, 2025
What’s new:
- Fluent in 11+ languages
- 40% faster (<200ms latency) & 50% cheaper than Octave 1
- Multi-speaker conversation
- More reliable pronunciation
- New voice conversion & phoneme editing… pic.twitter.com/dkj7QElsXL
Hume AI, the company behind Octave 2, specializes in emotionally intelligent AI voice technology. Their focus is on integrating nuanced vocal expression into AI models, pushing the boundaries of realistic voice synthesis. Early industry feedback points to Octave 2’s technical leap in speed and multilingual coverage, positioning Hume AI as a strong competitor in the voice AI landscape.