Version: 1.27

Meta Llama 3.3 70B Instruct

The Meta Llama 3.3 Instruct 70B multilingual model is an instruction tuned generative model in 70B (text in/text out). Privatemode provides a variant of this model that was quantized using AutoAWQ from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.

Model ID

ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4

Source

Hugging face

Modality

Input: text
Output: text

Features

Streaming
Tool calling
Structured outputs

Context limit

Context window: 70k tokens
Max output length: 4028

Endpoints

/v1/chat/completions
/v1/completions

Model ID​

Source​

Modality​

Features​

Context limit​

Endpoints​