Speech-to-text API
Use the Privatemode speech-to-text API to generate text from audio files. The API is compatible with the OpenAI transcriptions API. To generate text from audio, send your requests to the privatemode-proxy. Audio requests and responses are encrypted, both in transit and during processing.
Generating transcriptions
Send a POST form request to the following endpoint on your proxy:
POST /v1/audio/transcriptions
This endpoint generates a transcription of the provided audio file.
Request body
model
(string): The name of the model to use for transcription, e.g.,openai/whisper-large-v3
.file
(file): The audio file to transcribe. Supported formats areflac
,mp3
,mp4
,mpeg
,mpga
,m4a
,ogg
,wav
, andwebm
.language
(string, optional): The language of the audio in ISO-639-1 (e.g.en
) format. Not setting the correct language can lead to poor accuracy and performance.- For additional parameters see the vLLM transcriptions API documentation.
Returns
The response is a transcription object or a stream of transcription events containing:
text
(string): The transcribed text from the audio.- Other parameters: Other fields are consistent with the OpenAI API specifications.
Examples
Note: To run the examples below, start the privatemode proxy with a pre-configured API key or add an authentication header to the requests.
Example request
curl localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F 'model=openai/whisper-large-v3' \
-F 'file=@path/to/your/audio/file.mp3'
Example response
{
"text": "Hello World."
}
Available speech-to-text models
To list the available text-to-speech models, call the /v1/models
endpoint or see the models overview.