Inworld launches TTS-1.5 for real-time voice with 16 languages

Inworld launches TTS-1.5 with two real-time voice models, sub-250 ms latency, 16 language support, and flexible deployment for developers and enterprises.

· 1 min read
Inworld

Inworld has released TTS-1.5 on December 18, 2025, expanding its text-to-speech portfolio with two new models aimed at real-time voice applications. The launch positions Inworld as a direct competitor to established voice AI providers, with a focus on low latency, multilingual coverage, and deployment flexibility for both developers and enterprises.

TTS-1.5 ships in two variants:

  1. TTS-1.5-Max: Targets most production use cases, delivering sub-250 ms P90 latency with a 190 ms median.
  2. TTS-1.5-Mini: Designed for ultra-latency-sensitive scenarios, reaching 160 ms P90 and 120 ms median latency.

Both models are offered free through December 31, 2025, before moving to usage-based pricing at $10 per million characters for Max and $5 for Mini.

The models introduce a reworked, streaming-native audio codec built for real-time generation, alongside quantization-aware training and large-scale reinforcement learning to reduce word errors, cutoffs, and artifacts. Inworld reports top placements on independent, user-voted TTS leaderboards that prioritize naturalness and expressiveness over synthetic metrics.

TTS-1.5 supports 16 languages and includes updated voice cloning options via API, as well as on-premise deployment for organizations with strict data residency needs. The service is available through partners including LiveKit, NLX, Pipecat, Stream Vision Agents, and Vapi.

Founded to support interactive characters and conversational agents, Inworld continues to focus on voice systems that operate at human conversational speeds, targeting use cases ranging from assistants and live translation to accessibility tools and interactive entertainment.

Source