Version: 1.35

Rate limits

Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.

Rate limits can be hit across any of the options depending on what occurs first. For example, you might send 20 requests with only 100 prompt tokens to the ChatCompletions endpoint and that would fill your limit (if your request per minute limit was 20), even if you didn't send 20k tokens (if your prompt token per minute limit was 20k) within those 20 requests.
Rate limits are defined at the organization level, not API key level.
Rate limits may vary by the API and model being used.

Subscription Tiers

Free
Standard
Enterprise

The free tier has a limited amount of prompt and completion tokens available per month.

Limit type	Value
Prompt tokens	100,000/min and 1,000,000/month
Completion tokens	10,000/min and 1,000,000/month
Requests	20/min
Audio file size	100 minutes/month

Limit type	Value
Prompt tokens	100,000/min
Completion tokens	10,000/min
Requests	20/min

Initial Token Contingent

New organizations start with a one-time Initial Token Contingent of 5,000,000 tokens. While this balance is positive, your usage is deducted from it, and you aren't subject to the standard monthly limits for the Free tier. Once depleted, the standard monthly limits apply.

Audio Usage Conversion

Audio usage (Speech-to-Text) is deducted from the initial token contingent at the following rate:

1 minute of audio ≈ 2,800 tokens

Token Counting & Multipliers

Each model has rate limit multiplier, we apply this multiplier to the token count of each request. The effective token usage counted towards your rate limits and monthly quotas is calculated as:

Effective Usage = Token Count x Model Multiplier

Note: Cached tokens don't count towards your rate limits.

Model Multipliers

Model	Multiplier	Effective Prompt Token Rate Limit (Free & Standard)	Effective Monthly Quota (Free)
Gemma 3 27B	1.0	100,000 tokens/min	1,000,000 tokens/month
gpt-oss-120b	1.0	100,000 tokens/min	1,000,000 tokens/month
Qwen3-Coder 30B-A3B	1.0	100,000 tokens/min	1,000,000 tokens/month
Qwen3-Embedding 4B	0.1	1,000,000 tokens/min	10,000,000 tokens/month

Subscription Tiers​

Initial Token Contingent​

Audio Usage Conversion​

Token Counting & Multipliers​

Model Multipliers​

Subscription Tiers

Initial Token Contingent

Audio Usage Conversion

Token Counting & Multipliers

Model Multipliers