Version: Next

Speech-to-text API

Use the Privatemode speech-to-text API to generate text from audio files. The API is compatible with the OpenAI transcriptions API. To generate text from audio, send your requests to the privatemode-proxy. Audio requests and responses are encrypted, both in transit and during processing.

Generating transcriptions

Send a POST form request to the following endpoint on your proxy:

POST /v1/audio/transcriptions

This endpoint generates a transcription of the provided audio file.

Request body

model (string): The name of the model to use for transcription, e.g., openai/whisper-large-v3.
file (file): The audio file to transcribe. Supported formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm.
language (string, optional): The language of the audio in ISO-639-1 (e.g. en) format. Not setting the correct language can lead to poor accuracy and performance.
For additional parameters see the vLLM transcriptions API documentation.

Returns

The response is a transcription object or a stream of transcription events containing:

text (string): The transcribed text from the audio.
Other parameters: Other fields are consistent with the OpenAI API specifications.

Examples

Note: To run the examples below, start the privatemode proxy with a pre-configured API key or add an authentication header to the requests.

Example request

curl localhost:8080/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F 'model=openai/whisper-large-v3' \
  -F 'file=@path/to/your/audio/file.mp3'

Example response

{
  "text": "Hello World."
}

Available speech-to-text models

To list the available text-to-speech models, call the /v1/models endpoint or see the models overview.

Generating transcriptions​

Request body​

Returns​

Examples​

Available speech-to-text models​

Generating transcriptions

Request body

Returns

Examples

Available speech-to-text models