Voice Settings Presets allow you to save and reuse combinations of voice settings (pace, pitch, temperature, etc.) for text-to-speech synthesis. Presets make it easy to maintain consistent voice styles across multiple synthesis requests without having to specify individual settings each time.
Voice Settings Presets support the following parameters:
Controls speech speed. Default: 1.0
< 1.0: Slower speech1.0: Normal speed> 1.0: Faster speechControls voice variation and randomness. Default: 0.8
Adjusts voice pitch (only applied for Voice Conversion/STS). Default: 0.0
Enable high-definition audio quality. Default: false
true: HD quality (may cost more credits)false: Standard qualityControls emotional expressiveness. Default: 0.5
0.0: Minimal emotion1.0: Maximum emotional emphasisText prompt describing desired voice style. Default: ""
Once created, include the voice_settings_preset_uuid parameter in your synthesis requests:
When using voice_settings_preset_uuid, preset settings will OVERRIDE any equivalent SSML settings in your data field.
If you need fine-grained SSML control over pace, pitch, or other voice settings, do NOT use a preset UUID. Instead, pass your SSML in the data field without the voice_settings_preset_uuid parameter.