ICYMI: xAI debuts Grok Voice Agent Builder for Enterprises

xAI launches Voice Agent Builder beta, letting teams create custom phone agents in minutes without coding, with 80+ voices and real-time tools.

· 2 min read
xAI

xAI has transitioned Grok Voice from a model/API play into a comprehensive no-code voice agent platform, introducing the Voice Agent Builder in beta. This platform is designed for operators and developers who aim to create production phone agents without having to manually assemble components such as speech-to-text, reasoning, text-to-speech, telephony, tools, guardrails, MCP support, and observability. Available through the xAI Console, it promises to create a personalized voice agent in under two minutes using a plain-language call flow, uploaded documents, connected tools, and a browser-based test call.

The platform is targeted at high-volume call workflows such as customer support, sales, lead qualification, reception, and scheduling. Agents can access knowledge bases, search documents during calls, and connect to services like Gmail, Google Calendar, Outlook, Linear, Notion, OneDrive, Google Drive, APIs, X search, web search, and remote MCP servers. They can also transfer callers to a human when necessary. Each call can be recorded, transcribed, replayed, and inspected, with tool usage visible for review.

xAI is positioning this launch against the typical voice-agent stack that combines separate speech recognition, a language model, and speech synthesis. The company claims that Grok Voice employs a more integrated speech-to-speech path, offering sub-second latency, support for over 25 languages, and handling of noisy phone audio, accents, interruptions, and callers who change direction mid-call. xAI also asserts that Grok Voice Think Fast 1.0 leads its τ-voice Bench table with a score of 67.3%, surpassing Gemini 3.1 Flash Live at 43.8% and GPT Realtime 1.5 at 35.3% on the same benchmark.

Voice configuration includes over 80 built-in voices in the Builder experience, as well as brand voice cloning from approximately 2 minutes of audio. Businesses have the option to use a free xAI-provisioned phone number, bring an existing number through SIP, or connect their own client over WebSocket. Pricing is structured around xAI’s real-time voice rate of $0.05 per minute of audio, with an additional $0.01 per minute for telephony on an xAI-provisioned number. xAI states that voices are included and there is no separate platform fee.

Source