Skip to main content
Version: 1.19

OpenAI Whisper Large v3

OpenAI Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation.

Model ID

openai/whisper-large-v3

Source

Hugging Face

Modality

  • Input: Audio (e.g., .mp3, .mp4, .wav)
  • Output: Text

Endpoints

Context Limit

Whisper has a receptive field of 30-seconds. Privatemode doesn't provide automatic chunking or a sliding window style implementation to process larger audio files. You have to split your audio file on the client-side and send the individual chunks to the Privatemode API. If you exceed the 30 second boundary you should see a warning similar to {"object":"error","message":"Maximum clip duration (30s) exceeded.","type":"BadRequestError","param":null,"code":400}.