Skip to main content
Version: 1.23

Speech-to-text API

Use the Privatemode speech-to-text API to generate text from audio files. The API is compatible with the OpenAI transcriptions API. To generate text from audio, send your requests to the privatemode-proxy. Audio requests and responses are encrypted, both in transit and during processing.

Generating transcriptions

Send a POST form request to the following endpoint on your proxy:

POST /v1/audio/transcriptions

This endpoint generates a transcription of the provided audio file.

Request body

  • model (string): The name of the model to use for transcription, e.g., openai/whisper-large-v3.
  • file (file): The audio file to transcribe. Supported formats are flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm.
  • language (string, optional): The language of the audio in ISO-639-1 (e.g. en) format. Not setting the correct language can lead to poor accuracy and performance.
  • prompt (string, optional): An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
  • For additional parameters see the vLLM transcriptions API documentation.

Returns

The response is a transcription object or a stream of transcription events containing:

  • text (string): The transcribed text from the audio.
  • Other parameters: Other fields are consistent with the OpenAI API specifications.

Examples

Note: To run the examples below, start the privatemode-proxy with a pre-configured API key or add an authentication header to the requests.

Example request

curl localhost:8080/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F 'model=openai/whisper-large-v3' \
-F 'file=@path/to/your/audio/file.mp3'

Example response

{
"text": "Hello World."
}

Available speech-to-text models

To list the available text-to-speech models, call the /v1/models endpoint or see the models overview.

warning

Privatemode's serving backend only supports files up to 25 MB in size. For larger files, consider splitting the audio into smaller segments, or try compressing the file to reduce its size.