Skip to main content
Version: 1.30

Gemma 3 27B

Gemma is a family of lightweight, state-of-the-art open models from Google. Gemma 3 models are multimodal, handling text and image input and generating text output. Privatemode provides a variant of this model that uses quantization from LLM Compressor to reduce the model size from FP16 to FP8.

Model ID

gemma-3-27b

Source

Hugging Face

Modality

  • Input: text, image
  • Output: text

Features

Context limit

  • Context window: 128k tokens

Endpoints

Example

from openai import OpenAI
import base64
import os

# docker run --pull=always -p 8080:8080 ghcr.io/edgelesssys/privatemode/privatemode-proxy:latest
# PRIVATEMODE_API_KEY=<> uv run --with openai gemma-vision.py

api_key = os.environ.get("PRIVATEMODE_API_KEY") # insert
api_base = "http://localhost:8080/v1"
image_path = (
"" # insert
)

client = OpenAI(
api_key=api_key,
base_url=api_base,
)

def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")



if not os.path.exists(image_path):
print(f"Error: Image file not found at {image_path}")
exit(1)

base64_image = encode_image_to_base64(image_path)

chat_response = client.chat.completions.create(
model="gemma-3-27b",
messages=
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
],
}
],
)

print("Chat completion output:", chat_response.choices[0].message.content)

Remarks and limitations

  • Gemma requires alternating user/assistant roles, so you can't use multiple user messages without assistant messages between them. For tool calling, the user role can use the tool role instead, i.e., user -> assistant (tool call) -> tool (result) -> assistant.
  • Gemma doesn't support mixed text and tool call outputs. Make sure you don't ask it to generate both in the same response, but separate requests for tool use and text responses.