Embeddings API
Use the Privatemode embeddings API to convert text into multidimensional text embeddings. The API is compatible with the OpenAI Embeddings API. To create embeddings, send your requests to the Privatemode proxy. Embedding requests and responses are encrypted, both in transit and during processing.
Generating embeddings
Send a POST request to the following endpoint on your proxy:
POST /v1/embeddings
This endpoint returns vector embeddings for your provided text input.
Request body
input(string or list of strings): The texts for which you want embeddings. The maximum length of each input depends on the model. Note: Input formatting varies by model: documents are typically embedded as-is, while queries should be prefixed with task instructions. Refer to the model's documentation for the correct format. See the examples below for proper query formatting.model(string): The name of the embedding model, e.g.,qwen3-embedding-4b.dimensions(int, optional) The number of dimensions of the output embedding vector. If not specified, the model’s default is used. Note: It depends on the embedding model whether a different value than the default is supported.encoding_format(string, optional): Set to"float"for a list of float values or"base64"for base64 encoded values.
Check available models for the model-specific input requirements
Returns
Returns an embeddings response object compatible with OpenAI's Embeddings API:
data: List of embedding objects (each with anembeddingarray andindex).object: Always"list".model: The model used.usage: Token usage statistics.
Examples
Note: To run the examples below, start the Privatemode proxy with a pre-configured API key or add an authentication header to the requests.
- Default
- Batch
- Python
Example request
#!/usr/bin/env bash
curl localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of Germany?",
"model": "qwen3-embedding-4b",
"encoding_format": "float"
}'
Example response
{
"id": "embd-b0f2e2ede7234a83aa5052128a239d9c",
"object": "list",
"created": 1747923707,
"model": "qwen3-embedding-4b",
"data": [
{
"index": 0,
"object": "embedding",
"embedding": [
0.0351, 0.0375, -0.0050, ... // truncated for brevity
]
}
],
"usage": {
"prompt_tokens": 13,
"total_tokens": 13,
}
}
Example request (batch input)
#!/usr/bin/env bash
curl localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": [
"Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of Germany?",
"The capital of Germany is Berlin."
],
"model": "qwen3-embedding-4b"
}'
Example response
{
"id": "embd-584a54ff36c84996b6ce667339ea3f40",
"created": 1747924226,
"model": "qwen3-embedding-4b",
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [ 0.0351, ... ] // truncated
},
{
"object": "embedding",
"index": 1,
"embedding": [ 0.0096, ... ] // truncated
}
],
"usage": {
"prompt_tokens": 22,
"total_tokens": 22
}
}
Example usage with OpenAI Python client
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="http://localhost:8080/v1",
)
def get_query_input(query: str) -> str:
task = "Given a web search query, retrieve relevant passages that answer the query"
return f"Instruct: {task}\nQuery: {query}"
responses = client.embeddings.create(
input=[
get_query_input("What is the capital of Germany?"),
"The capital of Germany is Berlin",
],
model="qwen3-embedding-4b",
)
for r in responses.data:
print(f"dim: {len(r.embedding)}, embedding: {r.embedding[:3]}...")
Output
dim: 2560, embedding: [-0.0004260667192284018, -0.019409293308854103, 0.06954004615545273]...
dim: 2560, embedding: [0.0001560685777803883, -0.021211298182606697, 0.07766252756118774]...
Available embedding models
To list the available embedding models, call the /v1/models endpoint or see the models overview.