API Rate Limiting: Token Bucket, Leaky Bucket, Fixed Window Explained

Rate limiting protects your API from runaway clients, abuse, and accidental DoS. Three algorithms are commonly used: token bucket, leaky bucket, and fixed window. They differ in how they handle bursts and how easy they are to implement in Redis. Here's the practical comparison and a working Redis implementation.

1. Fixed window

Simplest. Count requests per identifier (IP, user, API key) within a fixed time window (e.g., per minute). Reset the counter at the start of each window.

Redis implementation:

def check_rate_limit(key: str, limit: int = 100, window_seconds: int = 60) -> bool:
    current_window = int(time.time() / window_seconds)
    redis_key = f"ratelimit:{key}:{current_window}"
    count = redis.incr(redis_key)
    if count == 1:
        redis.expire(redis_key, window_seconds)
    return count <= limit

Pros: trivial, low Redis cost (1 op per request). Cons: burst at window boundaries. A client can do 100 requests in the last second of one window and 100 more in the first second of the next, effectively 200/second.

2. Token bucket

A bucket holds tokens. Each request consumes 1 token. Tokens refill at a constant rate. Burst-friendly: you can use all tokens at once, but must wait for refill.

Math: bucket of capacity C, refill rate R tokens per second. Steady-state max rate is R, burst max is C.

Redis implementation (using Lua script for atomicity):

local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])

local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- refill
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + elapsed * refill_rate)

-- consume
if tokens >= cost then
    tokens = tokens - cost
    redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
    redis.call("EXPIRE", key, 3600)
    return 1
else
    redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
    return 0
end

Pros: handles bursts gracefully, smooth rate over time. Cons: requires Lua script or careful transaction handling. Slightly more Redis state.

3. Leaky bucket

Requests enter a queue. The queue drains at a constant rate. If the queue is full, new requests are rejected.

Conceptually similar to token bucket but enforces a smoother output rate (no bursts allowed on the output side). Often implemented as a sliding window in practice.

Redis with sorted set (timestamps as scores):

def leaky_bucket(key: str, capacity: int, window_seconds: int) -> bool:
    now = time.time()
    pipe = redis.pipeline()
    pipe.zremrangebyscore(key, 0, now - window_seconds)
    pipe.zcard(key)
    pipe.zadd(key, {str(uuid.uuid4()): now})
    pipe.expire(key, window_seconds + 1)
    _, current_count, _, _ = pipe.execute()
    return current_count < capacity

Each request adds a timestamped entry. Before counting, prune old entries. Pros: smooth rate, accurate to the millisecond. Cons: more Redis memory (one entry per request) and CPU (set operations).

4. Choosing

Fixed window: internal tools, low-stakes endpoints, throwaway prototypes
Token bucket: public APIs where bursts are okay (Stripe uses this style)
Leaky bucket / sliding window: strict rate enforcement, fairness across users, anti-abuse

5. The 429 response

When a client exceeds the limit, return HTTP 429 (Too Many Requests). Include these headers:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717891200
Retry-After: 42
Content-Type: application/json

{"error": "rate_limited", "message": "Too many requests. Retry after 42 seconds."}

X-RateLimit-Reset is the Unix timestamp when the limit resets. Retry-After is seconds to wait. Both are conventions, not standards — emit both.

6. Identifier selection

What you rate-limit by matters as much as the algorithm:

IP address: easy to spoof, hits NAT'd users (offices) unfairly. Use as a fallback.
API key: best for authenticated APIs.
User ID: rate-limits a logged-in user across all their sessions.
Combined key + endpoint: different limits per endpoint within the same key (search 100/min, write 10/min).

7. Distributed rate limiting

Across multiple API servers, Redis is the natural shared state. One Redis instance can comfortably handle 50k-100k rate-limit checks per second. For higher scale, Redis Cluster or per-shard counters with periodic reconciliation.

Alternative: edge-based rate limiting at the CDN/WAF layer (Cloudflare, AWS WAF). Free for basic rules, decouples from your app servers. Use for IP-based DoS protection. Per-user limits still need to be in your application.

Comparison

Algorithm	Burst handling	Redis ops/req	Use case
Fixed window	Allows 2x burst at boundaries	1 (INCR + maybe EXPIRE)	Internal, low-stakes
Sliding window log	Strict, no burst	2-3 (ZADD, ZREMRANGE, ZCARD)	Anti-abuse
Sliding window counter	Approximate, low burst	2 (counts of 2 buckets)	Public API balanced
Token bucket	Configurable burst capacity	1 Lua script	Stripe-style API
Leaky bucket	Output rate enforced	Similar to sliding log	Webhook senders

FAQ

Where should I rate limit: API gateway or app?
Both. CDN/gateway handles IP-based abuse (cheap, fast). Application handles per-user/per-key business logic limits. Don't rely on app-layer to stop DoS.

What about distributed systems with multiple Redis instances?
Pick a sharding key (user ID, IP) and hash to a Redis shard. Each user's rate limit lives on one shard, no coordination needed. Trade-off: hot users hot-shard their Redis.

How do I rate limit logged-out users?
Combine IP + endpoint + UA fingerprint. Be more lenient than for authenticated users (legitimate users aren't yet logged in). Common: 30 requests per minute per IP for unauthenticated endpoints.

Should I burst-allow first-time API users?
Yes, but limit total daily quota. Token bucket with capacity C=50 and rate R=10/min lets a new user immediately try 50 calls, then settle into 10/min. Good DX, still protected.

Need rate limiting designed for your API?

We've built rate limiters for SaaS APIs handling 1M+ requests/day. Token bucket, Redis, Cloudflare integration.

Book a discovery call

API Rate Limiting: Token Bucket, Leaky Bucket, Fixed Window Explained

1. Fixed window

2. Token bucket

3. Leaky bucket

4. Choosing

5. The 429 response

6. Identifier selection

7. Distributed rate limiting

Comparison

FAQ

Need rate limiting designed for your API?

Related Posts

Production rate limiting, not a tutorial sketch

API Rate Limiting: Token Bucket, Leaky Bucket, Fixed Window Explained

1. Fixed window

2. Token bucket

3. Leaky bucket

4. Choosing

5. The 429 response

6. Identifier selection

7. Distributed rate limiting

Comparison

FAQ

Need rate limiting designed for your API?

Related Posts

Production rate limiting, not a tutorial sketch

Weekly Automation Insights