Azure OpenAI Rate Limit: How to Fix It

Azure OpenAI rate limit refers to the maximum number of requests or tokens that you are allowed to send to the Azure OpenAI API within a specified time. The rate limit is imposed to ensure fair usage, prevent abuse, and maintain the overall stability and performance of the OpenAI service.

When you send requests to the Azure OpenAI API, you are subject to certain limits on the frequency and volume of those requests. If you exceed these limits, you will encounter a rate-limiting error, typically indicated by an HTTP status code 429. This error signifies that you have reached the maximum rate of requests or tokens allowed within a specific timeframe.

The rate-limit error in OpenAI Azure occurs when you exceed the allowable rate of requests to the OpenAI API within a specified time frame. This error is typically associated with HTTP status code 429, Too Many Requests. It indicates that you have hit your assigned rate limit for the API, meaning you’ve submitted too many tokens or requests in a short period, and you have exceeded the number of requests allowed according to your subscription or plan.

How to Fix Azure OpenAI Rate Limit

To fix the rate limit error in OpenAI Azure, here is what you need to do:

1. Ensure that you are not making requests too frequently or in a loop without proper pacing. Implement a delay between requests to adhere to the rate limits imposed by OpenAI.

2. If your application involves making multiple requests, implement retry logic that respects the rate limit and response headers. This can include exponential backoff or other strategies to retry requests after encountering rate-limit errors.

3. Ensure that you are not sharing your API key with others. Rate limits are applied per organisation, so the collective usage of your organisation affects the rate limit.

4. If you are using a free or low-tier plan, consider upgrading to a pay-as-you-go plan that offers higher rate limits. Paid plans often provide increased capacity and fewer restrictions.

5. If your application requires higher rate limits, you can request an increase by upgrading your usage tier. You can view your current rate limits and upgrade options in the account settings of your OpenAI Azure subscription.

Read also: The Major Differences Between DeepMind AI and OpenAI

Azure OpenAI Rate Limit Best Practises

You can also fix this issue by following the general best practises recommended by Azure:

Avoid Sharp Changes in Workload:

Gradually increase the workload on the OpenAI API instead of making sudden, large-scale changes. Rapid spikes in traffic can lead to rate-limit errors. By introducing changes gradually, you give the system time to adapt and help prevent unexpected surges that may trigger rate-limiting.

Test Different Load Increase Patterns:

Experiment with different patterns when increasing the load on the OpenAI API. This involves testing and analysing how your application behaves under various traffic scenarios. By understanding how your system responds to different load patterns, you can optimise it for efficiency and avoid rate-limit issues.

Increase the Quota Assigned to You:

If your application consistently requires higher capacity and rate limits, consider increasing the quota assigned to your deployment. You can request a quota increase through the Azure portal. Additionally, if your organisation has multiple deployments, you may redistribute quotas from one deployment to another based on your needs.

How to Check Your Azure Quota Limit

To check your Azure quota limits, you can follow these steps:

1. Go to https://azure.microsoft.com/en-us/ and sign in with your Azure account credentials.

How to Check Azure Quota Limit

2. In the left sidebar, click on “Subscriptions” to view a list of your subscriptions.

3. Choose the specific subscription for which you want to check the quota limits.

4. Within the selected subscription, click on “Usage + Quotas” in the left navigation pane. This will give you information about your current quota limits and usage.

5. Use the filters provided to select the specific service provider (in this case, OpenAI) and the locations (Azure regions) you are interested in. This will allow you to view the quota limits and usage for the selected provider in the specified regions.

This process will give you insights into your current usage and quota limits for the selected Azure subscription and OpenAI service. If you need to request an increase in your quotas, there should be an option to submit a request directly from the portal.

Note that setting a spending limit in Azure is not a direct action that you perform explicitly. Instead, Azure provides spending limits as a way to control costs and prevent unexpected charges, especially for trial or free-tier accounts.

Default Quotas and Limits for Azure OpenAI

The rate limit is based on two primary metrics: Requests Per Minute (RPM) and Tokens Per Minute (TPM). These metrics help regulate the frequency and volume of requests made to the Azure OpenAI service. Here is detailed information on how it works:

Quota/limitValue
OpenAI resources per region per Azure subscription30
Default DALL-E 2 quota limits2 concurrent requests
Default DALL-E 3 quota limits2 capacity units (12 requests per minute)
Max fine-tuned model deployments5
Total number of training jobs per resource100
Max simultaneous running training jobs per resource1
Max training jobs queued20
Max Files per resource30
Total size of all files per resource1GB
Max training job time (job will fail if exceeded)720 hours
Max training job size (tokens in training file) x (# of epochs)2billion
Max size of all files per upload (Azure OpenAI on your data)
16 MB
Maximum prompt tokens per requestVaries per model
Limits for Azure OpenAI Table

Regional Quota Limits

Regional Quota Limits refer to the specific restrictions imposed on the usage of a service, resource, or feature within a particular geographic region. In Azure OpenAI, these limits are applied to models based on their type and the Azure region in which they are deployed.

The restrictions on the number of tokens processed per minute for a given model in your specific region This limitation helps manage the overall usage and distribution of resources; it prevents a single user from monopolising the available capacity. Here is how it works in detail.

ModelRegionsTokens per minute
gpt-35-turbo-instructEast US, Sweden Central240 K
gpt-35-turboEast US, South Central US, West Europe, France Central, UK South240 K
gpt-35-turboNorth Central US, Australia East, East US 2, Canada East, Japan East, Sweden Central, and Switzerland North300k
gpt-35-turbo-16kEast US, South Central US, West Europe, France Central, UK South240k
gpt-35-turbo-16kNorth Central US, Australia East, East US 2, Canada East, Japan East, Sweden Central, and Switzerland North300k
gpt-4East US, South Central US, West Europe, France Central20k
gpt-4North Central US, Australia East, East US 2, Canada East, Japan East, UK South, Sweden Central, Switzerland North40k
gpt-4-32kEast US, South Central US, West Europe, France Central60k
gpt-4-32kNorth Central US, Australia East, East US 2, Canada East, Japan East, UK South, Sweden Central, Switzerland North80k
text-embedding-ada-002East US, South Central US, West Europe, France Central240k
text-embedding-ada-002North Central US, Australia East, East US 2, Canada East, Japan East, UK South, Sweden Central, Switzerland North350k
Fine-tuning models (babbage-002, davinci-002, gpt-35-turbo-0613)North Central US, Sweden Central50k
all other modelsEast US, South Central US, West Europe, France Central120k

How to Increase Quota Limit in Azure

To change the quota limit for a specific resource in Azure, you can follow these steps:

How to Increase Quota Limit in Azure

1. Go to https://azure.microsoft.com/en-us/ and sign in with your Azure account.

2. In the left sidebar, find and click on “Quotas” under the “All Services” section. If you don’t see “Quotas” directly, you can use the search bar to find it.

3. In the Quotas blade, select the specific service for which you want to change the quota. In this case, select “Storage.”

4. Select the subscription for which you want to increase the storage account quota.

5. Find the region for which you want to increase the storage account quota.

6. In the selected region, locate the “Request increase” icon and click on it.

7. In the “Request quota increase” dialogue, enter the desired number up to 500 (or the maximum allowed value).

8. Submit the request for the quota increase.

Depending on your subscription type and Azure policies, the request may go through an approval process. Azure will review and respond to your request. You can check the status of your quota increase request in the Azure portal.