Llama 3.3 70B
Deprecated
This model is deprecated and will be removed on December 14th, 2025. Please migrate to an alternative model.
The Meta Llama 3.3 70B Instruct multilingual model is an instruction tuned generative model in 70B (text in/text out). Privatemode provides a variant of this model that was quantized using AutoAWQ from FP16 down to INT4 using GEMM kernels, with zero-point quantization and a group size of 128.
Model ID
llama-3.3-70b
Source
Modality
- Input: text
- Output: text
Features
Context limit
- Context window: 70k tokens
- Max output length: 4028
Endpoints
Rate limits
This model has a rate limit multiplier of 1.0. The effective rate limit for the Free and Standard tier is 100,000 prompt tokens/minute and 10,000 completion tokens/minute. The effective monthly quota for the Free tier is 1,000,000 tokens/month.