Clone a Voice Overview | Resemble

Voice cloning creates an AI model of a person’s voice from audio samples. Once trained, you can generate speech in that voice using any text input.

Choose Your Voice Type

The Voice Cloning API requires a Business plan or higher. Upgrade your plan to get started.

Rapid Clone

Create natural-sounding AI voices with just 10 seconds of audio. The process is designed with simplicity in mind—provide a clear audio sample and our AI model delivers a fully-functional voice clone that’s immediately ready to use.

Professional Clone

Our professional-grade voice clones are nearly indistinguishable from the authentic source. Ideal for videos, audiobooks, podcasts, video games, and beyond.

	Rapid Clone	Professional Clone
Audio needed	10 seconds – 3 minutes	10 – 25+ minutes
Training time	Under 1 minute	~40 minutes
Voice quality	Excellent	Excellent (nearly indistinguishable from source)
Emotional range	Limited	Full range of emotions
Best for	Quick iterations, demos, prototyping	Videos, audiobooks, podcasts, video games, production apps

Two Ways to Clone a Voice

Choose the method that fits your workflow:

Method 1: Clone with a Dataset File

Best when you already have audio files ready to upload.

Step 1: Prepare your dataset

Rapid: Single WAV file (≥10 seconds)
Professional: ZIP archive with multiple files (≥10 minutes total)
View supported formats →

Step 2: Create voice with dataset URL

$ curl 'https://app.resemble.ai/api/v2/voices' \
>   -H 'Authorization: Bearer YOUR_API_KEY' \
>   -H 'Content-Type: application/json' \
>   --data '{
>     "name": "Alex",
>     "voice_type": "professional",
>     "dataset_url": "https://example.com/audio-dataset.zip"
>   }'

Step 3: Training starts

Professional voices train automatically
Rapid voices require calling Build Voice

Method 2: Upload Individual Recordings

Best when you want to record and upload samples one-by-one.

Step 1: Create an empty voice

$ curl 'https://app.resemble.ai/api/v2/voices' \
>   -H 'Authorization: Bearer YOUR_API_KEY' \
>   -H 'Content-Type: application/json' \
>   --data '{
>     "name": "Alex",
>     "voice_type": "rapid"
>   }'

Save the uuid from the response - you’ll need it for uploading recordings.

Step 2: Upload recordings to the voice

$ curl 'https://app.resemble.ai/api/v2/voices/{voice_uuid}/recordings' \
>   -H 'Authorization: Bearer YOUR_API_KEY' \
>   -F 'file=@audio.wav' \
>   -F 'name=sample_01' \
>   -F 'text=Transcript of the audio' \
>   -F 'emotion=neutral' \
>   -F 'is_active=true'

Repeat until you have:

Rapid: 3+ recordings (≈10 seconds total)
Professional: 20+ recordings (≥10 minutes)

Step 3: Start training

Call Build Voice to train the model:

$ curl 'https://app.resemble.ai/api/v2/voices/{voice_uuid}/build' \
>   -H 'Authorization: Bearer YOUR_API_KEY' \
>   -H 'Content-Type: application/json' \
>   --data '{}'

After Training Completes

Once training finishes (status = finished), your voice is ready to use:

Generate audio via Text-to-Speech
Use it in Speech-to-Speech
Manage with List Voices and other endpoints

Monitoring Training Progress

Set a callback_uri when creating your voice to receive a webhook notification when training completes:

1 {
2   "ok": true,
3   "id": "voice-uuid",
4   "status": "finished"
5 }

If there’s a dataset issue, you’ll receive details about problematic recordings including quality scores (STOI, PESQ, SI-SDR).

Next Steps

Create Voice

Create a new voice via API

Manage Recordings

Upload and manage audio samples

Build Voice

Trigger voice training

Generate Speech

Start generating audio