11. Rate Limiting and Throttling

Rate limiting protects your API from abuse, prevents accidental runaway scripts from overwhelming your infrastructure, and ensures fair access for all consumers.

How It Works

The server tracks the number of requests each client makes within a fixed time window. When a client exceeds the allowed number of requests, the server responds with 429 Too Many Requests and the client must wait before retrying.

A common starting point is 60 requests per minute per API key, though you should tune this based on your traffic patterns and infrastructure capacity.

Response Headers

Include rate-limit information in every response so that well-behaved clients can pace themselves without guessing:

Header	Value
`X-RateLimit-Limit`	The maximum number of requests allowed in the current window (e.g. `60`).
`X-RateLimit-Remaining`	How many requests the client has left in the current window (e.g. `42`).
`X-RateLimit-Reset`	The Unix timestamp at which the current window resets (e.g. `1706000000`).

Example response headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1706000000

Retry-After on 429

When the server does return a 429, include a Retry-After header telling the client exactly how long to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 30

Choosing a Strategy

Fixed window — simple to implement; resets at regular intervals. Clients can burst up to the limit at the start of each window.
Sliding window — tracks requests over a rolling period, producing smoother throttling behaviour and fewer burst spikes.
Token bucket — allows short bursts above the average rate while still enforcing a long-term cap. Well suited to APIs with occasional spikes in legitimate traffic.

Document your chosen strategy and limits in your API specification so clients can design their request patterns accordingly.