Prompting
For the prompting and response schema, simply adhere to the OpenAI Chat API specification. We like to highlight that we don't use any OpenAI services but only follow their interface definitions. For sending prompts, simply use the privatemode-proxy as your endpoint. It'll take care of end-to-end encryption with our GenAI services for you.
You can't send prompts directly to api.privatemode.ai
. Always send your prompts to the privatemode-proxy, which handles encryption and communicates with the actual GenAI endpoint. For you, the proxy effectively acts as your GenAI endpoint.
An example for a default and a stream-configured prompt and its respective response is given below. This guide assumes the privatemode-proxy is running on localhost:8080
.
Example prompting
For prompting, use the following proxy endpoint:
POST /v1/chat/completions
This endpoint generates a response to a chat prompt.
Request body
model
string: Eitherlatest
or the name of a currently available model. Note that the models are continuously updated and support for older ones is dropped. To always use the latest model, uselatest
to not have to update this parameter.messages
list: The prompts for which a response is generated.- Additional parameters: These mirror the OpenAI API and are supported based on the model server's capabilities. Options requiring internet access, such as
image_url
, aren't supported yet.
Returns
The response is a chat completion or chat completion chunk object containing:
choices
string: The response generated by the model.- Other parameters: Other fields are consistent with the OpenAI API specifications.
- Default
- Streaming
Example request
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "latest",
"messages": [
{
"role": "user",
"content": "Tell me a joke!"
}
]
}'
Example response
{
"id": "chat-6e8dc369b0614e2488df6a336c24c349",
"object": "chat.completion",
"created": 1727968175,
"model": "<model>",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "What do you call a fake noodle?\n\nAn impasta.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 40,
"total_tokens": 54,
"completion_tokens": 14
},
"prompt_logprobs": null
}
Example request
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": latest",
"messages": [
{
"role": "user",
"content": "Hi there!"
}
],
"stream" : true
}'
Example response
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"content":"It"},"logprobs":null,"finish_reason":null}]}
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"content":"'s"},"logprobs":null,"finish_reason":null}]}
...
{"id":"chat-4f0bb41857044f52b5fa03fd3c752c8e","object":"chat.completion.chunk","created":1727968591,"model":"<model>","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":"stop","stop_reason":null}]}
Available models
Privatemode currently serves the ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4 model. More models will be available soon.
List models
GET /v1/models
This endpoint lists all currently available models.
Returns
The response is a list of model objects.
For detailed information, refer to the OpenAI API documentation.
Example request
curl localhost:8080/v1/models
Example response
{
"id": "<model>",
"object": "model",
"created": 1727968847,
"owned_by": "vllm",
"root": "<model>",
"parent": null,
"max_model_len": 131072,
"permission": [
{
"id": "modelperm-763c1f8144b745efa4e7dd984faf9517",
"object": "model_permission",
"created": 1727968847,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
System prompts
The offered model supports setting a system prompt as part of the request's messages
field (see example below). You can use this to tailor the model's behavior to your specific needs.
Improving language accuracy
The model may occasionally make minor language mistakes, especially in languages other than English. To optimize language accuracy, you can set a system prompt. The following example significantly improves accuracy for the German language:
{
"role": "system",
"content": "Ensure every response is free from grammar and spelling errors. Use only valid words. Apply correct article usage, especially for languages with gender-specific articles like German. Follow standard grammar and syntax rules, and check spelling against standard dictionaries. Maintain consistency in style and terminology throughout."
}