Skip to main content

Voice on Nexus

Nexus brings a substantial voice upgrade to v3 agents: two distinct runtime modes, a custom voice library with cloning, expanded provider choice, and finer control over conversation latency. This section walks you through what's new, how to configure it, and how to test it.

If you're new to building voice bots in general, the foundational concepts (voice as a channel, speech recognition, text-to-speech basics) live under Voice as a Channel. The pages here focus on the v3-specific voice surface.

Two voice modes

Every v3 voice agent runs in one of two modes — pick one in AI Agent → Profile → Voice Settings.

ModeHow it worksWhen to use
Text & TTS (pipeline)Speech-to-text → LLM executor → Text-to-speech. The classic three-stage pipeline, with each stage independently swappable and tunable.Most production voice bots. Best provider choice, best language coverage, finest control over each stage.
Realtime AudioRealtime audio API (OpenAI Realtime today; multi-provider framework in place for Gemini Live, Anthropic, and MiniMax). Audio in, audio out, no intermediate text. Supports registered tools (DynamicTool / MCP / KB), mid-call agent switch, greet-on-connect, and per-turn traces.Short interactions where end-to-end latency matters more than fine-grained provider tuning. Newer voice agents that benefit from sub-second response.

Both modes are fully supported. Pipeline mode is the default and what most production bots use today; Realtime Audio has reached feature parity with pipeline mode on tools / agent-switch / traces and is ready for new builds where its latency profile matters.

Voice settings sub-page on the super-agent Configuration — Mode selector (Text & TTS), TTS provider (ElevenLabs), Voice (Rachel), Voice instructions, Audio settings, and Advanced VAD controls

What's new for voice in v3

Compared to v2 voice bots, v3 gives you:

  • A dedicated Voice section — separate top-level routes under /ai-agent/voice/ for settings, custom voices, telephony, and TTS testing. No more digging through deep menus.
  • Custom voices — clone a voice from a short audio sample, manage your library, browse curated global voices.
  • More providers — pipeline mode supports Yellow AI (default), Deepgram, ElevenLabs, MiniMax, Microsoft Azure, Google, Sarvam for TTS and STT.
  • Realtime Audio mode — OpenAI Realtime audio API for ultra-low-latency conversations. Now supports registered tools (DynamicTool / MCP / KB), mid-call agent switch, greet-on-connect, per-turn traces, and a multi-provider framework (Gemini Live / Anthropic / MiniMax adapters in flight). Production bots flip mode via voiceOptions.mode = "realtime" in Voice Settings — no separate feature flag.
  • Tunable VAD (Voice Activity Detection) — control when the bot considers a user has finished speaking, exposed in Voice Settings and the Voice Playground.
  • Per-language voice samples — voice cloning provides a guided sample text per language so users record the right thing first time.
  • WebRTC test calls — make a real test call from your browser without setting up SIP first.

Where everything lives in the studio

PagePathWhat it does
Voice SettingsAI Agent → Profile → Voice SettingsProvider, mode, voice ID, VAD, voice instructions. The agent-level config.
Voice LibraryAI Agent → Voice → LibraryBrowse your custom voices and Yellow's curated voices. Preview, copy ID, delete.
Clone a VoiceAI Agent → Voice → CloneUpload or record a sample, name it, and add it to your library.
TTS PlaygroundAI Agent → Voice → TTSType any text, hear it spoken in any voice. Useful for picking a voice.
TelephonyAI Agent → Voice → TelephonyConnect Vonage, Twilio, or another provider. Configure SIP routing.
Voice testingAI Agent → Voice → TelephonyMake a real test call (browser Web Call or outbound Phone Call), see the live transcript, hear the bot reply.

The first five are configuration; the last is testing.

AI Agent → Voice top-level page showing the five tabs across the top (Synthesize, Library, Clone, Settings, Telephony) with the Synthesize / TTS Playground content underneath

Where to go next

  • Voice Settings — set the provider, mode, voice, and VAD for an agent.
  • Custom Voices — clone a voice and manage your library.
  • Telephony — wire a phone number to your bot.
  • Testing — Voice Playground walkthrough.
  • Best Practices — picking a mode, language, and provider; latency tuning.