Speech-to-Text

Transcribe audio or video and ask follow-up questions using Intelligence queries.

Key Capabilities

  • Accepts file uploads, signed tokens, or remote URLs
  • Returns transcripts with speaker labels and word-level timestamps
  • Supports Intelligence queries for summaries and insights
  • Delivers results to your server via webhook callbacks (callback_url)
  • Offers a zero-retention mode that permanently deletes all media and transcript content after delivery
  • Handles files up to 500 MB and 20 minutes in duration

Workflow

  1. Create a job – Upload content and optionally include an initial Intelligence query.
  2. Check status – Poll the job or list active submissions, or pass a callback_url to be notified.
  3. Retrieve results – Fetch the completed transcript and Intelligence answers.
  4. Run additional queries – Ask further questions using the job UUID.

Zero Retention

For privacy-sensitive workloads, pass zero_retention_mode=true (with a mandatory callback_url): the uploaded media and any temporary processing copies are permanently deleted as soon as transcription finishes, the transcript is delivered once to your callback, and the content is then purged from Resemble entirely — only a content-free audit stub remains. Zero retention is a plan feature; see Create Transcript Job for details.

Access Requirements

RequirementDetails
Team scopeOperates within the authenticated user’s team.
AuthenticationResemble session or API token.

Quick Start

$curl --request POST 'https://app.resemble.ai/api/v2/speech-to-text' \
> -H 'Authorization: Bearer YOUR_API_TOKEN' \
> -H 'Content-Type: multipart/form-data' \
> -F 'file=@/path/to/audio.mp3' \
> -F 'query=What are the key points discussed?'

Platform-wide rate limits apply. Contact support for higher throughput or dedicated capacity.