Version: 1.35

Embeddings API

Use the Privatemode embeddings API to convert text into multidimensional text embeddings. The API is compatible with the OpenAI Embeddings API. To create embeddings, send your requests to the Privatemode proxy. Embedding requests and responses are encrypted, both in transit and during processing.

Generating embeddings

Send a POST request to the following endpoint on your proxy:

POST /v1/embeddings

This endpoint returns vector embeddings for your provided text input.

Request body

input (string or list of strings): The texts for which you want embeddings. The maximum length of each input depends on the model. Note: Input formatting varies by model: documents are typically embedded as-is, while queries should be prefixed with task instructions. Refer to the model's documentation for the correct format. See the examples below for proper query formatting.
model (string): The name of the embedding model, e.g., qwen3-embedding-4b.
dimensions (int, optional) The number of dimensions of the output embedding vector. If not specified, the model’s default is used. Note: It depends on the embedding model whether a different value than the default is supported.
encoding_format (string, optional): Set to "float" for a list of float values or "base64" for base64 encoded values.

info

Check available models for the model-specific input requirements

Returns

Returns an embeddings response object compatible with OpenAI's Embeddings API:

data: List of embedding objects (each with an embedding array and index).
object: Always "list".
model: The model used.
usage: Token usage statistics.

Examples

Note: To run the examples below, start the Privatemode proxy with a pre-configured API key or add an authentication header to the requests.

Default
Batch
Python

Example request

#!/usr/bin/env bash

curl localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of Germany?",
    "model": "qwen3-embedding-4b",
    "encoding_format": "float"
  }'

Example response

{
  "id": "embd-b0f2e2ede7234a83aa5052128a239d9c",
  "object": "list",
  "created": 1747923707,
  "model": "qwen3-embedding-4b",
  "data": [
    {
      "index": 0,
      "object": "embedding",
      "embedding": [
        0.0351, 0.0375, -0.0050, ... // truncated for brevity
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "total_tokens": 13,
  }
}

Example request (batch input)

#!/usr/bin/env bash

curl localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of Germany?",
      "The capital of Germany is Berlin."
    ],
    "model": "qwen3-embedding-4b"
  }'

Example response

{
  "id": "embd-584a54ff36c84996b6ce667339ea3f40",
  "created": 1747924226,
  "model": "qwen3-embedding-4b",
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [ 0.0351, ... ]  // truncated
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [ 0.0096, ... ]  // truncated
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "total_tokens": 22
  }
}

Example usage with OpenAI Python client

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="http://localhost:8080/v1",
)


def get_query_input(query: str) -> str:
    task = "Given a web search query, retrieve relevant passages that answer the query"
    return f"Instruct: {task}\nQuery: {query}"


responses = client.embeddings.create(
    input=[
        get_query_input("What is the capital of Germany?"),
        "The capital of Germany is Berlin",
    ],
    model="qwen3-embedding-4b",
)

for r in responses.data:
    print(f"dim: {len(r.embedding)}, embedding: {r.embedding[:3]}...")

Output

dim: 2560, embedding: [-0.0004260667192284018, -0.019409293308854103, 0.06954004615545273]...
dim: 2560, embedding: [0.0001560685777803883, -0.021211298182606697, 0.07766252756118774]...

Available embedding models

To list the available embedding models, call the /v1/models endpoint or see the models overview.

Generating embeddings​

Request body​

Returns​

Examples​

Available embedding models​

Generating embeddings

Request body

Returns

Examples

Available embedding models