Rate limits
Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.
- Rate limits can be hit across any of the options depending on what occurs first. For example, you might send 20 requests with only 100 prompt tokens to the ChatCompletions endpoint and that would fill your limit (if your request per minute limit was 20), even if you didn't send 20k tokens (if your prompt token per minute limit was 20k) within those 20 requests.
- Rate limits are defined at the organization level, not API key level.
- Rate limits may vary by the API and model being used.
Subscription Tiers
- Free
- Standard
- Enterprise
The free tier has a limited amount of prompt and completion tokens available per month.
| Limit type | Value |
|---|---|
| Prompt tokens | 100,000/min and 1,000,000/month |
| Completion tokens | 10,000/min and 1,000,000/month |
| Requests | 20/min |
| Audio file size | 25 MB/min and 100 MB/month |
The standard tier is a pay-as-you-go subscription and has no monthly usage limit.
| Limit type | Value |
|---|---|
| Prompt tokens | 100,000/min |
| Completion tokens | 10,000/min |
| Requests | 20/min |
| Audio file size | 25 MB/min |
For increased rate limits, please contact us.
Token Counting & Multipliers
Each model has rate limit multiplier, we apply this multiplier to the token count of each request. The effective token usage counted towards your rate limits and monthly quotas is calculated as:
Effective Usage = Token Count x Model Multiplier
Note: Cached tokens don't count towards your rate limits.
Model Multipliers
| Model | Multiplier | Effective Prompt Token Rate Limit (Free & Standard) | Effective Monthly Quota (Free) |
|---|---|---|---|
| Gemma 3 27B | 1.0 | 100,000 tokens/min | 1,000,000 tokens/month |
| gpt-oss-120b | 1.0 | 100,000 tokens/min | 1,000,000 tokens/month |
| Qwen2.5-Coder 14B | 1.0 | 100,000 tokens/min | 1,000,000 tokens/month |
| Qwen3-Coder 30B-A3B | 1.0 | 100,000 tokens/min | 1,000,000 tokens/month |
| Qwen3-Embedding 4B | 0.1 | 1,000,000 tokens/min | 10,000,000 tokens/month |