Version: 1.38

Models

Privatemode gives you access to the following models. For pricing and rate limits, see Pricing and Rate limits.

Model	Model ID	Type	Input	Context / limit	Endpoints
Gemma 3 27B	`gemma-3-27b`	Chat	Text, image	128k tokens	`/v1/chat/completions`
gpt-oss-120b	`gpt-oss-120b`	Chat	Text	128k tokens	`/v1/chat/completions`, `/v1/completions`, `/v1/messages`
Kimi K2.5	`kimi-k2.5`	Chat	Text, image	262k tokens	`/v1/chat/completions`, `/v1/completions`, `/v1/messages`
Qwen3-Coder 30B-A3B (deprecated)	`qwen3-coder-30b-a3b`	Chat	Text	128k tokens	`/v1/chat/completions`, `/v1/completions`
Qwen3-Embedding 4B	`qwen3-embedding-4b`	Embedding	Text	32k tokens	`/v1/embeddings`
Voxtral Mini 3B (preview)	`voxtral-mini-3b`	Speech-to-text	Audio	50 MB	`/v1/audio/transcriptions`
Whisper large-v3	`whisper-large-v3`	Speech-to-text	Audio	50 MB	`/v1/audio/transcriptions`

All chat models support streaming, tool calling, and structured outputs.

Kimi K2.5

The model supports images as input. Follow this guide to use the feature. Kimi K2.5 performs reasoning steps to improve response quality. To disable reasoning and reduce latency at the cost of quality, extend your request with:

{
    "chat_template_kwargs": {"thinking": false}
}

Gemma 3 27B

The model supports images as input. Follow this guide to use the feature.

The model supports tool calling with the following constraints:

Gemma requires alternating user/assistant roles, so you can't use multiple user messages without assistant messages between them. For tool calling, the user role can use the tool role instead, i.e., user -> assistant (tool call) -> tool (result) -> assistant.
Gemma doesn't support mixed text and tool call outputs. Make sure you don't ask it to generate both in the same response, but separate requests for tool use and text responses.

Qwen3-Embedding 4B

The model uses Matryoshka training and supports output dimensions of 1024 or 2560, set via the dimensions field in the embeddings request. For most tasks, 1024 dimensions is sufficient. For other dimensionalities, truncate the returned vector client-side and re-normalize it afterward.

Voxtral Mini 3B

Preview

This model is currently offered as a preview and may be removed without a prior deprecation period.

Use sufficiently high-quality audio with adequate bit rates. MPEG with the MP2 codec is too low quality and can result in cut-off or excessively long generations. Consider Whisper large-v3 if you face quality issues.

Qwen3-Coder 30B-A3B

Deprecated

This model is deprecated and will be removed in a future release. Migrate coding workflows to Kimi K2.5.

Kimi K2.5​

Gemma 3 27B​

Qwen3-Embedding 4B​

Voxtral Mini 3B​

Qwen3-Coder 30B-A3B​

Kimi K2.5

Gemma 3 27B

Qwen3-Embedding 4B

Voxtral Mini 3B

Qwen3-Coder 30B-A3B