Design a Voice Overview

Voice Design creates AI-generated voices from text descriptions—no audio recordings required. Describe the voice you want, choose from three candidates, and start generating speech instantly.

How It Works

Voice Design uses AI to generate synthetic voices based on your text prompt. Instead of cloning an existing voice from audio samples, you describe characteristics like age, accent, tone, and style, and the system creates unique voices matching your description.

Perfect for:

  • Rapid prototyping when you don’t have recordings
  • Creating fictional or character voices
  • Testing different voice styles quickly
  • Projects where recording voice talent isn’t feasible

The Complete Flow

Step 1: Write a Descriptive Prompt

Describe the voice you want. Be specific about:

  • Demographics: Age, gender
  • Accent/Language: British, American Southern, neutral, etc.
  • Tone: Warm, authoritative, energetic, calm
  • Style: Professional narrator, friendly guide, character voice

Example prompts:

  • “A middle-aged female with an Australian accent, friendly and approachable, like a knowledgeable tour guide”
  • “A young male with a British accent, energetic and enthusiastic, like a sports commentator”
  • “An elderly male with a deep, authoritative voice, calm and trustworthy, like a documentary narrator”

Need help writing effective prompts? See our Prompting Guide for tips and best practices.

Step 2: Generate Candidates

Call the Generate Voice Candidates endpoint with your prompt:

$curl 'https://app.resemble.ai/api/v2/voice-design' \
> -H 'Authorization: Bearer YOUR_API_TOKEN' \
> -F 'user_prompt=A middle-aged female with an Australian accent, friendly and approachable, like a knowledgeable tour guide'

You’ll receive 3 different voice candidates that match your description:

1{
2 "voice_candidates": [
3 {
4 "audio_url": "https://...",
5 "voice_sample_index": 0,
6 "uuid": "abc123"
7 },
8 {
9 "audio_url": "https://...",
10 "voice_sample_index": 1,
11 "uuid": "abc123"
12 },
13 {
14 "audio_url": "https://...",
15 "voice_sample_index": 2,
16 "uuid": "abc123"
17 }
18 ]
19}

Important: All three candidates share the same uuid (they’re from the same generation request). The voice_sample_index (0, 1, or 2) identifies which candidate is which.

Step 3: Listen and Choose

  1. Download and listen to all three audio_url samples
  2. Pick the one that best matches your needs
  3. Note its voice_sample_index (0, 1, or 2)
  4. Save the uuid from the response

Step 4: Create the Voice

Convert your chosen candidate into a usable voice with Create Voice from Candidate:

$curl 'https://app.resemble.ai/api/v2/voice-design/{uuid}/{voice_sample_index}/create_rapid_voice' \
> -H 'Authorization: Bearer YOUR_API_TOKEN' \
> -F 'voice_name=Tour Guide Voice'

Example (choosing candidate #1):

$curl 'https://app.resemble.ai/api/v2/voice-design/abc123/1/create_rapid_voice' \
> -H 'Authorization: Bearer YOUR_API_TOKEN' \
> -F 'voice_name=Tour Guide Voice'

Response:

1{
2 "voice_uuid": "xyz789"
3}

The voice builds automatically in the background. You can use it immediately, even while it’s still training.

Step 5: Start Generating Speech

Use the voice_uuid to generate audio with the TTS endpoint:

$curl --request POST "https://f.cluster.resemble.ai/synthesize" \
> -H "Authorization: Bearer YOUR_API_TOKEN" \
> -H "Content-Type: application/json" \
> --data '{
> "voice_uuid": "xyz789",
> "data": "Welcome to Sydney! Today we will explore the Opera House.",
> "sample_rate": 48000,
> "output_format": "wav"
> }'

The voice is immediately usable in:

Quick Reference

StepActionEndpoint
1Write prompt-
2Generate candidatesPOST /voice-design
3Listen & chooseDownload audio from audio_url
4Create voicePOST /voice-design/{uuid}/{index}/create_rapid_voice
5Generate speechPOST https://f.cluster.resemble.ai/synthesize

Voice Design vs Voice Cloning

Voice DesignVoice Cloning
InputText descriptionAudio recordings
Setup timeInstant (no recordings needed)Requires uploading/recording audio
Voice typeAI-generated synthetic voicesClone of a specific person’s voice
Best forQuick prototyping, fictional voicesHigh-fidelity recreation of real voices
CustomizationLimited to prompt descriptionsExact replication of source voice

Next Steps