Skip to main content
Version: 1.31

Llama 3.3 70B

Deprecated

This model is deprecated and will be removed on December 14th, 2025. Please migrate to an alternative model.

The Meta Llama 3.3 70B Instruct multilingual model is an instruction tuned generative model in 70B (text in/text out). Privatemode provides a variant of this model that was quantized using AutoAWQ from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.

Model ID

llama-3.3-70b

Source

Hugging face

Modality

  • Input: text
  • Output: text

Features

Context limit

  • Context window: 70k tokens
  • Max output length: 4028

Endpoints

Rate limits

This model has a rate limit multiplier of 1.0. The effective rate limit for the Free and Standard tier is 100,000 prompt tokens/minute and 10,000 completion tokens/minute. The effective monthly quota for the Free tier is 1,000,000 tokens/month.