All voices you clone with Resemble automatically use Chatterbox, our latest and most advanced text-to-speech model. You don’t need to select a model—every new voice is created with Chatterbox by default.
Chatterbox is available in three variants:
Note: Time-to-first-sound (TTFS) reflects best-case numbers. Cold starts, network latency, and load can increase actual latency.
The following models are deprecated and no longer available for new voice cloning. They are listed here for reference only.