Design a Voice Overview
Voice Design creates AI-generated voices from text descriptions—no audio recordings required. Describe the voice you want, choose from three candidates, and start generating speech instantly.
How It Works
Voice Design uses AI to generate synthetic voices based on your text prompt. Instead of cloning an existing voice from audio samples, you describe characteristics like age, accent, tone, and style, and the system creates unique voices matching your description.
Perfect for:
- Rapid prototyping when you don’t have recordings
- Creating fictional or character voices
- Testing different voice styles quickly
- Projects where recording voice talent isn’t feasible
The Complete Flow
Step 1: Write a Descriptive Prompt
Describe the voice you want. Be specific about:
- Demographics: Age, gender
- Accent/Language: British, American Southern, neutral, etc.
- Tone: Warm, authoritative, energetic, calm
- Style: Professional narrator, friendly guide, character voice
Example prompts:
- “A middle-aged female with an Australian accent, friendly and approachable, like a knowledgeable tour guide”
- “A young male with a British accent, energetic and enthusiastic, like a sports commentator”
- “An elderly male with a deep, authoritative voice, calm and trustworthy, like a documentary narrator”
Need help writing effective prompts? See our Prompting Guide for tips and best practices.
Step 2: Generate Candidates
Call the Generate Voice Candidates endpoint with your prompt:
You’ll receive 3 different voice candidates that match your description:
Important: All three candidates share the same uuid (they’re from the same generation request). The voice_sample_index (0, 1, or 2) identifies which candidate is which.
Step 3: Listen and Choose
- Download and listen to all three
audio_urlsamples - Pick the one that best matches your needs
- Note its
voice_sample_index(0, 1, or 2) - Save the
uuidfrom the response
Step 4: Create the Voice
Convert your chosen candidate into a usable voice with Create Voice from Candidate:
Example (choosing candidate #1):
Response:
The voice builds automatically in the background. You can use it immediately, even while it’s still training.
Step 5: Start Generating Speech
Use the voice_uuid to generate audio with the TTS endpoint:
The voice is immediately usable in:
