Engineering

API Rate Limiting: Token Bucket, Leaky Bucket, Fixed Window Explained

May 2026 · 12 min read

Rate limiting protects your API from runaway clients, abuse, and accidental DoS. Three algorithms are commonly used: token bucket, leaky bucket, and fixed window. They differ in how they handle bursts and how easy they are to implement in Redis. Here's the practical comparison and a working Redis implementation.

1. Fixed window

Simplest. Count requests per identifier (IP, user, API key) within a fixed time window (e.g., per minute). Reset the counter at the start of each window.

Redis implementation:

def check_rate_limit(key: str, limit: int = 100, window_seconds: int = 60) -> bool:
    current_window = int(time.time() / window_seconds)
    redis_key = f"ratelimit:{key}:{current_window}"
    count = redis.incr(redis_key)
    if count == 1:
        redis.expire(redis_key, window_seconds)
    return count <= limit

Pros: trivial, low Redis cost (1 op per request). Cons: burst at window boundaries. A client can do 100 requests in the last second of one window and 100 more in the first second of the next, effectively 200/second.

2. Token bucket

A bucket holds tokens. Each request consumes 1 token. Tokens refill at a constant rate. Burst-friendly: you can use all tokens at once, but must wait for refill.

Math: bucket of capacity C, refill rate R tokens per second. Steady-state max rate is R, burst max is C.

Redis implementation (using Lua script for atomicity):

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- refill
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + elapsed * refill_rate)

-- consume
if tokens >= cost then
    tokens = tokens - cost
    redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
    redis.call("EXPIRE", key, 3600)
    return 1
else
    redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
    return 0
end

Pros: handles bursts gracefully, smooth rate over time. Cons: requires Lua script or careful transaction handling. Slightly more Redis state.

3. Leaky bucket

Requests enter a queue. The queue drains at a constant rate. If the queue is full, new requests are rejected.

Conceptually similar to token bucket but enforces a smoother output rate (no bursts allowed on the output side). Often implemented as a sliding window in practice.

Redis with sorted set (timestamps as scores):

def leaky_bucket(key: str, capacity: int, window_seconds: int) -> bool:
    now = time.time()
    pipe = redis.pipeline()
    pipe.zremrangebyscore(key, 0, now - window_seconds)
    pipe.zcard(key)
    pipe.zadd(key, {str(uuid.uuid4()): now})
    pipe.expire(key, window_seconds + 1)
    _, current_count, _, _ = pipe.execute()
    return current_count < capacity

Each request adds a timestamped entry. Before counting, prune old entries. Pros: smooth rate, accurate to the millisecond. Cons: more Redis memory (one entry per request) and CPU (set operations).

4. Choosing

  • Fixed window: internal tools, low-stakes endpoints, throwaway prototypes
  • Token bucket: public APIs where bursts are okay (Stripe uses this style)
  • Leaky bucket / sliding window: strict rate enforcement, fairness across users, anti-abuse

5. The 429 response

When a client exceeds the limit, return HTTP 429 (Too Many Requests). Include these headers:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717891200
Retry-After: 42
Content-Type: application/json

{"error": "rate_limited", "message": "Too many requests. Retry after 42 seconds."}

X-RateLimit-Reset is the Unix timestamp when the limit resets. Retry-After is seconds to wait. Both are conventions, not standards — emit both.

6. Identifier selection

What you rate-limit by matters as much as the algorithm:

  • IP address: easy to spoof, hits NAT'd users (offices) unfairly. Use as a fallback.
  • API key: best for authenticated APIs.
  • User ID: rate-limits a logged-in user across all their sessions.
  • Combined key + endpoint: different limits per endpoint within the same key (search 100/min, write 10/min).

7. Distributed rate limiting

Across multiple API servers, Redis is the natural shared state. One Redis instance can comfortably handle 50k-100k rate-limit checks per second. For higher scale, Redis Cluster or per-shard counters with periodic reconciliation.

Alternative: edge-based rate limiting at the CDN/WAF layer (Cloudflare, AWS WAF). Free for basic rules, decouples from your app servers. Use for IP-based DoS protection. Per-user limits still need to be in your application.

Comparison

Algorithm Burst handling Redis ops/req Use case
Fixed window Allows 2x burst at boundaries 1 (INCR + maybe EXPIRE) Internal, low-stakes
Sliding window log Strict, no burst 2-3 (ZADD, ZREMRANGE, ZCARD) Anti-abuse
Sliding window counter Approximate, low burst 2 (counts of 2 buckets) Public API balanced
Token bucket Configurable burst capacity 1 Lua script Stripe-style API
Leaky bucket Output rate enforced Similar to sliding log Webhook senders

FAQ

Where should I rate limit: API gateway or app?
Both. CDN/gateway handles IP-based abuse (cheap, fast). Application handles per-user/per-key business logic limits. Don't rely on app-layer to stop DoS.

What about distributed systems with multiple Redis instances?
Pick a sharding key (user ID, IP) and hash to a Redis shard. Each user's rate limit lives on one shard, no coordination needed. Trade-off: hot users hot-shard their Redis.

How do I rate limit logged-out users?
Combine IP + endpoint + UA fingerprint. Be more lenient than for authenticated users (legitimate users aren't yet logged in). Common: 30 requests per minute per IP for unauthenticated endpoints.

Should I burst-allow first-time API users?
Yes, but limit total daily quota. Token bucket with capacity C=50 and rate R=10/min lets a new user immediately try 50 calls, then settle into 10/min. Good DX, still protected.

Need rate limiting designed for your API?

We've built rate limiters for SaaS APIs handling 1M+ requests/day. Token bucket, Redis, Cloudflare integration.

Book a discovery call

Related Posts

Webhook Security Cloudflare Workers Tutorial
← All blog posts

Production rate limiting, not a tutorial sketch

Algorithm choice, identifier strategy, headers, distributed concerns — done right.

Book a discovery call