Any API exposed to the public will eventually get hammered, whether by a buggy client looping on an error, a scraper, or someone outright abusing it. Rate limiting is how you protect your service and keep it fair for everyone. I have added it to plenty of APIs, and the concepts are simpler than the jargon suggests.
Pick an algorithm
There are a few classic approaches and they trade off accuracy against cost. A fixed window counts requests per time bucket, say 100 per minute, and resets on the boundary. It is dead simple but allows bursts at the edges, since a client can fire 100 requests at the end of one minute and 100 at the start of the next. A sliding window smooths that out by weighting the previous window. A token bucket refills tokens at a steady rate and lets clients spend them in bursts up to a cap, which feels the most natural for real traffic.
- Fixed window: easiest to build, allows edge bursts.
- Sliding window: more accurate, slightly more bookkeeping.
- Token bucket: handles bursts gracefully, my usual default.
A simple token bucket
The idea is that each client has a bucket of tokens. Every request costs one token, and tokens refill over time. If the bucket is empty, the request is rejected. Here is the core logic, ignoring storage for a moment:
function allow(bucket, now, rate, capacity) {
const elapsed = (now - bucket.last) / 1000;
bucket.tokens = Math.min(capacity, bucket.tokens + elapsed * rate);
bucket.last = now;
if (bucket.tokens >= 1) {
bucket.tokens -= 1;
return true;
}
return false;
}
That function refills based on how much time has passed, caps the bucket so tokens do not accumulate forever, and spends one token per allowed request. It is maybe fifteen lines and it covers the vast majority of real needs.
Identify the client correctly
Rate limiting is only as good as your idea of who a client is. For authenticated traffic, key the limit on the API key or user ID, which is reliable. For anonymous traffic you fall back to IP address, which is imperfect because users behind the same network share an IP and proxies can spoof it. If you are behind a proxy or CDN, read the forwarded header, but only trust it when the request genuinely came through your infrastructure. Getting the key wrong means you either punish innocent users or fail to stop abusers.
Respond the right way
When you reject a request, do it with the correct HTTP status and helpful headers so well-behaved clients can adapt. The status is 429 Too Many Requests. Always include a Retry-After header telling the client how long to wait, and the standard rate limit headers so they can self-throttle before hitting the wall:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 30
Content-Type: application/json
{ "error": "rate limit exceeded, retry in 30 seconds" }
A good client reads those headers and backs off politely. A bad one ignores them, but at least you gave it the chance, and you have a clean record of why you said no.
Where to store the counters
An in-memory counter works for a single server and falls apart the moment you scale to two, because each instance has its own view. For anything distributed you need shared state. Redis is the classic choice; it is fast and has atomic increment operations that make this easy. On the edge, a key-value store like Cloudflare’s KV or Durable Objects does the same job close to the user. Whatever you pick, the counter update must be atomic, or two simultaneous requests can both read the same count and both get through.
Layer it with other defenses
Rate limiting is one layer, not the whole wall. Pair it with authentication, input validation, and sensible timeouts. Put it as early in the request lifecycle as you can, ideally before any expensive work, so a flood of blocked requests costs you almost nothing. If you are deploying this kind of service yourself, the same edge platform I describe in deploying to Cloudflare Pages can run the API and its rate-limit store together, and you can ship it confidently through a GitHub Actions pipeline. Build the limiter once, keep it boring, and it quietly does its job under load.