Skip to main content
Version: Next

Speech-to-text API

Use the Privatemode speech-to-text API to generate text from audio files. The API is compatible with the OpenAI transcriptions API. To generate text from audio, send your requests to the privatemode-proxy. Audio requests and responses are encrypted, both in transit and during processing.

Generating transcriptions

Send a POST form request to the following endpoint on your proxy:

POST /v1/audio/transcriptions

This endpoint generates a transcription of the provided audio file.

Request body

  • model (string): The name of the model to use for transcription, e.g., openai/whisper-large-v3.
  • file (file): The audio file to transcribe. Supported formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm.
  • language (string, optional): The language of the audio in ISO-639-1 (e.g. en) format. Not setting the correct language can lead to poor accuracy and performance.
  • For additional parameters see the vLLM transcriptions API documentation.

Returns

The response is a transcription object or a stream of transcription events containing:

  • text (string): The transcribed text from the audio.
  • Other parameters: Other fields are consistent with the OpenAI API specifications.

Examples

Note: To run the examples below, start the privatemode proxy with a pre-configured API key or add an authentication header to the requests.

Example request

curl localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F 'model=openai/whisper-large-v3' \
-F 'file=@path/to/your/audio/file.mp3'

Example response

{
"text": "Hello World."
}

Available speech-to-text models

To list the available text-to-speech models, call the /v1/models endpoint or see the models overview.