Version: 1.19

Google Gemma 3 27B IT

Gemma is a family of lightweight, state-of-the-art open models from Google. Gemma 3 models are multimodal, handling text and image input and generating text output. Privatemode provides a variant of this model that was quantized using LLM Compressor from FP16 down to FP8.

Model ID

leon-se/gemma-3-27b-it-fp8-dynamic

Source

Hugging Face

Modality

Input: text, image
Output: text

Features

Context limit

Context window: 70k tokens

Endpoints

/v1/chat/completions

Example

Image input

from openai import OpenAI
import base64
import os

# docker run --pull=always -p 8080:8080 ghcr.io/edgelesssys/privatemode/privatemode-proxy:latest
# PRIVATEMODE_API_KEY=<> uv run --with openai gemma-vision.py

api_key = os.environ.get("PRIVATEMODE_API_KEY") # insert
api_base = "http://localhost:8080/v1"
image_path = (
    "" # insert
)

client = OpenAI(
    api_key=api_key,
    base_url=api_base,
)

def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")



if not os.path.exists(image_path):
    print(f"Error: Image file not found at {image_path}")
    exit(1)

base64_image = encode_image_to_base64(image_path)

chat_response = client.chat.completions.create(
    model="leon-se/gemma-3-27b-it-fp8-dynamic`",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                },
            ],
        }
    ],
)

print("Chat completion output:", chat_response.choices[0].message.content)

Model ID​

Source​

Modality​

Features​

Context limit​

Endpoints​

Example​