Skip to main content
Version: 1.23

Rate limits

Rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time.

  • Rate limits can be hit across any of the options depending on what occurs first. For example, you might send 20 requests with only 100 prompt tokens to the ChatCompletions endpoint and that would fill your limit (if your request per minute limit was 20), even if you didn't send 20k tokens (if your prompt token per minute limit was 20k) within those 20 requests.
  • Rate limits are defined at the organization level, not API key level.
  • Rate limits may vary by the API and model being used.

The free tier has a limited amount of prompt and completion tokens available per month.

Limit typeValue
Prompt tokens10.000/min - 300.000/month
Completion tokens5.000/min - 200.000/month
Requests10/min
Audio file size2/MB/min - 100/MB/month