For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Status
OverviewProductsManageAPI ReferenceTutorialsClient Libraries
OverviewProductsManageAPI ReferenceTutorialsClient Libraries
  • Voices
    • Overview
    • List Voices
    • Get Voice
    • Delete Voice
  • Recordings
    • Overview
    • List Recordings
    • Get Recording
    • Create Recording
    • Update Recording
    • Delete Recording
  • Projects & Clips
  • Voice Settings
    • Overview
    • List Presets
    • Get Preset
    • Create Preset
    • Update Preset
    • Delete Preset
  • Custom Pronunciations
    • Overview
    • Create Pronunciation
    • Bulk Create from ZIP
    • List Pronunciations
    • Get Pronunciation
    • Toggle Active Status
    • Delete Pronunciation
  • Account & Billing
    • Overview
    • Get Account
    • Teams
    • Billing Usage
Status
LogoLogo
On this page
  • Request Body (multipart/form-data)
  • Response (201 Created)
Custom Pronunciations

Create Pronunciation

Was this page helpful?
Previous

Bulk Create from ZIP

Next
Built with

POST https://app.resemble.ai/api/v2/pronunciations

Upload a single pronunciation with a reference audio file.

Request Body (multipart/form-data)

FieldTypeRequiredDescription
wordstringYesThe word or phrase (2-100 characters). Letters, accented characters, apostrophes, hyphens, and spaces only.
audiofileYesReference audio file (wav, flac, mp3, m4a, ogg, webm, aac). Duration: 200ms-10s. Max size: 10MB.
$curl -X POST https://app.resemble.ai/api/v2/pronunciations \
> -H "Authorization: Bearer YOUR_API_KEY" \
> -F "word=abemaciclib" \
> -F "audio=@abemaciclib.wav"

Response (201 Created)

1{
2 "success": true,
3 "item": {
4 "uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
5 "word": "abemaciclib",
6 "status": "pending",
7 "active": true,
8 "audio_url": "https://...",
9 "created_at": "2026-03-12T00:00:00.000Z",
10 "updated_at": "2026-03-12T00:00:00.000Z"
11 }
12}

The pronunciation starts with status: "pending" while the audio is being processed. Once processing completes, the status transitions to "ready" (usually within a few seconds) or "failed" if something went wrong. Only pronunciations with status "ready" are applied during synthesis.