Gemma 3 27B
Gemma is a family of lightweight, state-of-the-art open models from Google. Gemma 3 models are multimodal, handling text and image input and generating text output. Privatemode provides a variant of this model that uses quantization from LLM Compressor to reduce the model size from FP16 to FP8.
Model ID
gemma-3-27b
Source
Modality
- Input: text, image
- Output: text
Features
- Streaming
- Tool calling (see remarks below)
- Structured outputs
Context limit
- Context window: 128k tokens
Endpoints
Example
- Image input
from openai import OpenAI
import base64
import os
# docker run --pull=always -p 8080:8080 ghcr.io/edgelesssys/privatemode/privatemode-proxy:latest
# PRIVATEMODE_API_KEY=<> uv run --with openai gemma-vision.py
api_key = os.environ.get("PRIVATEMODE_API_KEY") # insert
api_base = "http://localhost:8080/v1"
image_path = (
"" # insert
)
client = OpenAI(
api_key=api_key,
base_url=api_base,
)
def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
if not os.path.exists(image_path):
print(f"Error: Image file not found at {image_path}")
exit(1)
base64_image = encode_image_to_base64(image_path)
chat_response = client.chat.completions.create(
model="gemma-3-27b",
messages=
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
],
}
],
)
print("Chat completion output:", chat_response.choices[0].message.content)
Remarks and limitations
- Gemma requires alternating
user/assistantroles, so you can't use multiple user messages without assistant messages between them. For tool calling, theuserrole can use thetoolrole instead, i.e.,user->assistant(tool call) ->tool(result) ->assistant. - Gemma doesn't support mixed text and tool call outputs. Make sure you don't ask it to generate both in the same response, but separate requests for tool use and text responses.