Hume AI launches Octave 2 and EVI 4 mini voice models

What's new? Octave 2 is a TTS model with sub200ms response, 11 languages, voice conversion and phoneme editing soon; EVI 4 mini powers speech-to-speech tasks with external LLM;

· 1 min read
EVI

Hume AI has launched Octave 2, its next-generation voice AI model for text-to-speech, publicly previewed on the company's platform and API. Octave 2 delivers faster response times under 200 milliseconds and spans 11 languages, including Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Portuguese, Russian, and Spanish. The rollout targets developers, enterprises, and creators seeking state-of-the-art speech synthesis. Compared to its predecessor, Octave 2 is 40% faster and more cost-effective, now at half the price of Octave 1, and is optimized for advanced inference hardware through partnership with Sambanova.

💡
EVI-4 mini demo

Octave 2 introduces voice conversion and direct phoneme editing capabilities, letting users swap voices or fine-tune pronunciation, timing, and emphasis. These features enable applications in dubbing, entertainment, and nuanced voiceovers, with advanced support for uncommon words, repetition, numbers, and symbols. Although the core features are available now, voice conversion and phoneme editing are set to arrive shortly. EVI 4 mini, also released, leverages Octave 2 for speech-to-speech tasks but requires pairing with an external LLM for full language generation.

Hume AI, the company behind Octave 2, specializes in emotionally intelligent AI voice technology. Their focus is on integrating nuanced vocal expression into AI models, pushing the boundaries of realistic voice synthesis. Early industry feedback points to Octave 2’s technical leap in speed and multilingual coverage, positioning Hume AI as a strong competitor in the voice AI landscape.

Source