Clone a Voice Overview
Voice cloning creates an AI model of a person’s voice from audio samples. Once trained, you can generate speech in that voice using any text input.
Choose Your Voice Type
The Voice Cloning API requires a Business plan or higher. Upgrade your plan to get started.
Rapid Clone
Create natural-sounding AI voices with just 10 seconds of audio. The process is designed with simplicity in mind—provide a clear audio sample and our AI model delivers a fully-functional voice clone that’s immediately ready to use.
Professional Clone
Our professional-grade voice clones are nearly indistinguishable from the authentic source. Ideal for videos, audiobooks, podcasts, video games, and beyond.
Two Ways to Clone a Voice
Choose the method that fits your workflow:
Method 1: Clone with a Dataset File
Best when you already have audio files ready to upload.
Step 1: Prepare your dataset
- Rapid: Single WAV file (≥10 seconds)
- Professional: ZIP archive with multiple files (≥10 minutes total)
- View supported formats →
Step 2: Create voice with dataset URL
Step 3: Training starts
- Professional voices train automatically
- Rapid voices require calling Build Voice
Method 2: Upload Individual Recordings
Best when you want to record and upload samples one-by-one.
Step 1: Create an empty voice
Save the uuid from the response - you’ll need it for uploading recordings.
Step 2: Upload recordings to the voice
Repeat until you have:
- Rapid: 3+ recordings (≈10 seconds total)
- Professional: 20+ recordings (≥10 minutes)
Step 3: Start training
Call Build Voice to train the model:
After Training Completes
Once training finishes (status = finished), your voice is ready to use:
- Generate audio via Text-to-Speech
- Use it in Speech-to-Speech
- Manage with List Voices and other endpoints
Monitoring Training Progress
Set a callback_uri when creating your voice to receive a webhook notification when training completes:
If there’s a dataset issue, you’ll receive details about problematic recordings including quality scores (STOI, PESQ, SI-SDR).
