Skip to main content
Version: 1.29

Llama 3.3 70B

The Meta Llama 3.3 70B Instruct multilingual model is an instruction tuned generative model in 70B (text in/text out). Privatemode provides a variant of this model that was quantized using AutoAWQ from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.

Model ID

llama-3.3-70b

Source

Hugging face

Modality

  • Input: text
  • Output: text

Features

Context limit

  • Context window: 70k tokens
  • Max output length: 4028

Endpoints