Version: 1.7

Prompting

For the prompting and response schema, simply adhere to the OpenAI Chat API specification. We like to highlight that we don't use any OpenAI services but only follow their interface definitions. For sending prompts, simply use the privatemode-proxy as your endpoint. It'll take care of end-to-end encryption with our GenAI services for you.

note

You can't send prompts directly to api.privatemode.ai. Always send your prompts to the privatemode-proxy, which handles encryption and communicates with the actual GenAI endpoint. For you, the proxy effectively acts as your GenAI endpoint.

An example for a default and a stream-configured prompt and its respective response is given below. This guide assumes the privatemode-proxy is running on localhost:8080.

Example prompting

For prompting, use the following proxy endpoint:

POST /v1/chat/completions

This endpoint generates a response to a chat prompt.

Request body

model string: Either latest or the name of a currently available model. Note that the models are continuously updated and support for older ones is dropped. To always use the latest model, use latest to not have to update this parameter.
messages list: The prompts for which a response is generated.
Additional parameters: These mirror the OpenAI API and are supported based on the model server's capabilities. Options requiring internet access, such as image_url, aren't supported yet.

Returns

The response is a chat completion or chat completion chunk object containing:

choices string: The response generated by the model.
Other parameters: Other fields are consistent with the OpenAI API specifications.

Default
Streaming

Example request

curl localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "latest",
    "messages": [
      {
        "role": "user",
        "content": "Tell me a joke!"
      }
    ]
  }'

Example response

{
  "id": "chat-6e8dc369b0614e2488df6a336c24c349",
  "object": "chat.completion",
  "created": 1727968175,
  "model": "<model>",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "What do you call a fake noodle?\n\nAn impasta.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 40,
    "total_tokens": 54,
    "completion_tokens": 14
  },
  "prompt_logprobs": null
}

Example request

curl localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "latest",
    "messages": [
      {
        "role": "user",
        "content": "Hi there!"
      }
    ],
    "stream" : true
  }'

Example response

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"content":"It"},"logprobs":null,"finish_reason":null}]}

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"content":"'s"},"logprobs":null,"finish_reason":null}]}

    ...

{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":null}]}

Available models

Privatemode currently serves the ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4 model. More models will be available soon.

List models

GET /v1/models

This endpoint lists all currently available models.

Returns

The response is a list of model objects.

For detailed information, refer to the OpenAI API documentation.

Example request

curl localhost:8080/v1/models

Example response

{
  "id": "<model>",
  "object": "model",
  "created": 1727968847,
  "owned_by": "vllm",
  "root": "<model>",
  "parent": null,
  "max_model_len": 131072,
  "permission": [
    {
      "id": "modelperm-763c1f8144b745efa4e7dd984faf9517",
      "object": "model_permission",
      "created": 1727968847,
      "allow_create_engine": false,
      "allow_sampling": true,
      "allow_logprobs": true,
      "allow_search_indices": false,
      "allow_view": true,
      "allow_fine_tuning": false,
      "organization": "*",
      "group": null,
      "is_blocking": false
    }
  ]
}

System prompts

The offered model supports setting a system prompt as part of the request's messages field (see example below). You can use this to tailor the model's behavior to your specific needs.

Improving language accuracy

The model may occasionally make minor language mistakes, especially in languages other than English. To optimize language accuracy, you can set a system prompt. The following example significantly improves accuracy for the German language:

{
  "role": "system",
  "content": "Ensure every response is free from grammar and spelling errors. Use only valid words. Apply correct article usage, especially for languages with gender-specific articles like German. Follow standard grammar and syntax rules, and check spelling against standard dictionaries. Maintain consistency in style and terminology throughout."
}

Example prompting​

Request body​

Returns​

Available models​

List models​

Returns​

System prompts​

Improving language accuracy​

Example prompting

Request body

Returns

Available models

List models

Returns

System prompts

Improving language accuracy