For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Status
OverviewProductsManageAPI ReferenceTutorialsClient Libraries
OverviewProductsManageAPI ReferenceTutorialsClient Libraries
  • Voice Generation
    • Speech-to-Speech
  • Voice Creation
  • Voice Tools
      • Overview
      • Create Enhancement
      • List Enhancements
      • Get Enhancement
  • Real-Time Agents
    • Overview
  • Safety & Detection
Status
LogoLogo
On this page
  • Workflow
  • Enhancement Engines
  • v2 Parameters
  • v1 Parameters (Legacy)
  • Accepted Audio Formats
  • Polling Best Practices
  • Billing
Voice ToolsAudio Enhancement

Audio Enhancement

Was this page helpful?
Previous

Create Audio Enhancement

Next
Built with

Enhance audio files by removing background noise, normalizing loudness, and applying studio-quality processing. The API is asynchronous — you submit an audio file, receive a tracking UUID, then poll for the result.

Workflow

  1. Submit an audio file via the Create Enhancement endpoint.
  2. Receive a UUID and poll the Get Enhancement endpoint until status is completed or failed.
  3. Download the enhanced audio from the enhanced_audio_url.

Enhancement Engines

EngineDefaultDescription
v2✅Noise removal, normalization, and studio sound. Recommended for most use cases.
v1 (legacy)Loudness normalization with configurable target levels and enhancement intensity. Will be sunset in a future release.

v2 Parameters

ParameterTypeDefaultDescription
remove_noisebooleantrueRemove background noise.
normalizebooleantrueNormalize audio levels.
studio_soundbooleantrueApply studio-quality enhancement.

v1 Parameters (Legacy)

ParameterRangeDefaultDescription
enhancement_level0.0 – 1.01.0Intensity of processing. Lower values apply lighter touch.
loudness_target_level-70 – -5 LUFS-14Desired integrated loudness.
loudness_peak_limit-9 – 0 dBTP-1Maximum permitted peak. Lower values add headroom.

Accepted Audio Formats

WAV, MP3, M4A, MP4, OGG, AAC, FLAC. Maximum file size is 150 MB.

Polling Best Practices

  • Recommended interval: Poll every 5–10 seconds.
  • Timeout: Jobs timeout after approximately 5 minutes. If a job hasn’t completed by then, it will be marked as failed.
  • No retries: Failed jobs are not automatically retried. Submit a new request if needed.

Billing

Audio enhancement is a metered product billed per second of processed audio. If you exceed the limit, further requests return 403 Forbidden until the next billing cycle. Failed jobs are not billed.