Skip to main content
Version: 1.19

Meta Llama 3.3 70B Instruct

The Meta Llama 3.3 Instruct 70B multilingual model is an instruction tuned generative model in 70B (text in/text out). Privatemode provides a variant of this model that was quantized using AutoAWQ from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.

Model ID

ibnzterrell/Meta-Llama-3.3-70B-Instruct-AWQ-INT4

Source

Hugging face

Modality

  • Input: text
  • Output: text

Features

Context limit

  • Context window: 70k tokens
  • Max output length: 4028

Endpoints