Clone a Voice Overview

Voice cloning creates an AI model of a person’s voice from audio samples. Once trained, you can generate speech in that voice using any text input.

Choose Your Voice Type

The Voice Cloning API requires a Business plan or higher. Upgrade your plan to get started.

Rapid Clone

Create natural-sounding AI voices with just 10 seconds of audio. The process is designed with simplicity in mind—provide a clear audio sample and our AI model delivers a fully-functional voice clone that’s immediately ready to use.

Professional Clone

Our professional-grade voice clones are nearly indistinguishable from the authentic source. Ideal for videos, audiobooks, podcasts, video games, and beyond.

Rapid CloneProfessional Clone
Audio needed10 seconds – 3 minutes10 – 25+ minutes
Training timeUnder 1 minute~40 minutes
Voice qualityExcellentExcellent (nearly indistinguishable from source)
Emotional rangeLimitedFull range of emotions
Best forQuick iterations, demos, prototypingVideos, audiobooks, podcasts, video games, production apps

Two Ways to Clone a Voice

Choose the method that fits your workflow:

Method 1: Clone with a Dataset File

Best when you already have audio files ready to upload.

Step 1: Prepare your dataset

  • Rapid: Single WAV file (≥10 seconds)
  • Professional: ZIP archive with multiple files (≥10 minutes total)
  • View supported formats →

Step 2: Create voice with dataset URL

$curl 'https://app.resemble.ai/api/v2/voices' \
> -H 'Authorization: Bearer YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> --data '{
> "name": "Alex",
> "voice_type": "professional",
> "dataset_url": "https://example.com/audio-dataset.zip"
> }'

Step 3: Training starts

  • Professional voices train automatically
  • Rapid voices require calling Build Voice

Method 2: Upload Individual Recordings

Best when you want to record and upload samples one-by-one.

Step 1: Create an empty voice

$curl 'https://app.resemble.ai/api/v2/voices' \
> -H 'Authorization: Bearer YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> --data '{
> "name": "Alex",
> "voice_type": "rapid"
> }'

Save the uuid from the response - you’ll need it for uploading recordings.

Step 2: Upload recordings to the voice

$curl 'https://app.resemble.ai/api/v2/voices/{voice_uuid}/recordings' \
> -H 'Authorization: Bearer YOUR_API_KEY' \
> -F 'file=@audio.wav' \
> -F 'name=sample_01' \
> -F 'text=Transcript of the audio' \
> -F 'emotion=neutral' \
> -F 'is_active=true'

Repeat until you have:

  • Rapid: 3+ recordings (≈10 seconds total)
  • Professional: 20+ recordings (≥10 minutes)

Step 3: Start training

Call Build Voice to train the model:

$curl 'https://app.resemble.ai/api/v2/voices/{voice_uuid}/build' \
> -H 'Authorization: Bearer YOUR_API_KEY' \
> -H 'Content-Type: application/json' \
> --data '{}'

After Training Completes

Once training finishes (status = finished), your voice is ready to use:

Monitoring Training Progress

Set a callback_uri when creating your voice to receive a webhook notification when training completes:

1{
2 "ok": true,
3 "id": "voice-uuid",
4 "status": "finished"
5}

If there’s a dataset issue, you’ll receive details about problematic recordings including quality scores (STOI, PESQ, SI-SDR).

Next Steps