Speech-to-text API
Use the Privatemode speech-to-text API to generate text from audio files. The API is compatible with the OpenAI transcriptions API. To generate text from audio, send your requests to the privatemode-proxy. Audio requests and responses are encrypted, both in transit and during processing.
Generating transcriptions
Send a POST form request to the following endpoint on your proxy:
POST /v1/audio/transcriptions
This endpoint generates a transcription of the provided audio file.
Request body
model
(string): The name of the model to use for transcription, e.g.,openai/whisper-large-v3
.file
(file): The audio file to transcribe. Supported formats areflac
,mp3
,mp4
,mpeg
,mpga
,m4a
,ogg
,wav
, andwebm
.language
(string, optional): The language of the audio in ISO-639-1 (e.g.en
) format. Not setting the correct language can lead to poor accuracy and performance.prompt
(string, optional): An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.- For additional parameters see the vLLM transcriptions API documentation.
Returns
The response is a transcription object or a stream of transcription events containing:
text
(string): The transcribed text from the audio.- Other parameters: Other fields are consistent with the OpenAI API specifications.
Examples
Note: To run the examples below, start the privatemode-proxy with a pre-configured API key or add an authentication header to the requests.
Example request
curl localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F 'model=openai/whisper-large-v3' \
-F 'file=@path/to/your/audio/file.mp3'
Example response
{
"text": "Hello World."
}
Available speech-to-text models
To list the available text-to-speech models, call the /v1/models
endpoint or see the models overview.
Privatemode's serving backend only supports files up to 25 MB in size. For larger files, consider splitting the audio into smaller segments, or try compressing the file to reduce its size.