OpenAI Whisper Large v3
OpenAI Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation.
Model ID
openai/whisper-large-v3
Source
Modality
- Input: Audio (e.g.,
.mp3
,.mp4
,.wav
) - Output: Text
Endpoints
Context Limit
Whisper has a receptive field of 30-seconds.
Privatemode doesn't provide automatic chunking or a sliding window style implementation to process larger audio files.
You have to split your audio file on the client-side and send the individual chunks to the Privatemode API.
If you exceed the 30 second boundary you should see a warning similar to {"object":"error","message":"Maximum clip duration (30s) exceeded.","type":"BadRequestError","param":null,"code":400}
.