Default voices
OneInbox ships 28 voices out of the box across Cartesia, Deepgram, ElevenLabs, OpenAI, Minimax, and Shisa — no setup needed. You can use any of them immediately by settingtts.voice_id on your agent.
Browse the full list (with voice IDs) at any time:
Always use the
id field (vc_...) from the voice object as tts.voice_id — not the provider_voice_id. Passing the raw provider ID (e.g. a Cartesia UUID or a Deepgram model name) returns a VOICE_NOT_FOUND error.How voices work
Every agent has a TTS (text-to-speech) configuration that controls how it sounds. A default voice is automatically assigned when you create an agent — you can hear it immediately with a quick browser call (no phone number needed) or a real phone call. Follow this guide when you want to use a custom voice from ElevenLabs, Cartesia, or another provider. To switch, set theprovider and voice_id on the agent. The voice_id is the OneInbox id (vc_abc123) returned when you import a voice (see below).
Step 1 — Add an integration for your provider
If you want to import voices beyond the 28 platform voices, you need to add an integration for your third-party provider first. An integration stores your provider API key securely so OneInbox can authenticate on your behalf.ElevenLabs
Cartesia
id — save it for Step 2.
Step 2 — Import a voice
Import a specific voice from your provider into OneInbox using its provider voice ID. The provider voice ID is the native ID from your provider’s voice library (not the OneInbox ID).id field (vc_abc123) is the OneInbox voice ID — this is what you use in the next step when assigning the voice to your agent.
Step 3 — Assign the voice to your agent
Use the OneInboxid from the import response (vc_abc123) to assign the voice to your agent. This tells the agent to use that specific voice for all calls.
| Field | Range | What it does |
|---|---|---|
voice_id | — | The OneInbox id from the import response (format: vc_abc123) |
speed | 0.5–2.0 | Playback rate. 1.0 is normal speech speed |
stability | 0.0–1.0 | Voice consistency. Higher = more consistent tone, lower = more expressive |