Voice cloning creates an AI model of a person’s voice from audio samples. Once trained, you can generate speech in that voice using any text input.
The Voice Cloning API requires a Business plan or higher. Upgrade your plan to get started.
Create natural-sounding AI voices with just 10 seconds of audio. The process is designed with simplicity in mind—provide a clear audio sample and our AI model delivers a fully-functional voice clone that’s immediately ready to use.
Our professional-grade voice clones are nearly indistinguishable from the authentic source. Ideal for videos, audiobooks, podcasts, video games, and beyond.
Choose the method that fits your workflow:
Best when you already have audio files ready to upload.
Step 1: Prepare your dataset
Step 2: Create voice with dataset URL
Step 3: Training starts
Best when you want to record and upload samples one-by-one.
Step 1: Create an empty voice
Save the uuid from the response - you’ll need it for uploading recordings.
Step 2: Upload recordings to the voice
Repeat until you have:
Step 3: Start training
Call Build Voice to train the model:
Once training finishes (status = finished), your voice is ready to use:
Set a callback_uri when creating your voice to receive a webhook notification when training completes:
If there’s a dataset issue, you’ll receive details about problematic recordings including quality scores (STOI, PESQ, SI-SDR).