Rate limiting protects your API from runaway clients, abuse, and accidental DoS. Three algorithms are commonly used: token bucket, leaky bucket, and fixed window. They differ in how they handle bursts and how easy they are to implement in Redis. Here's the practical comparison and a working Redis implementation.
1. Fixed window
Simplest. Count requests per identifier (IP, user, API key) within a fixed time window (e.g., per minute). Reset the counter at the start of each window.
Redis implementation:
def check_rate_limit(key: str, limit: int = 100, window_seconds: int = 60) -> bool:
current_window = int(time.time() / window_seconds)
redis_key = f"ratelimit:{key}:{current_window}"
count = redis.incr(redis_key)
if count == 1:
redis.expire(redis_key, window_seconds)
return count <= limit
Pros: trivial, low Redis cost (1 op per request). Cons: burst at window boundaries. A client can do 100 requests in the last second of one window and 100 more in the first second of the next, effectively 200/second.
2. Token bucket
A bucket holds tokens. Each request consumes 1 token. Tokens refill at a constant rate. Burst-friendly: you can use all tokens at once, but must wait for refill.
Math: bucket of capacity C, refill rate R tokens per second. Steady-state max rate is R, burst max is C.
Redis implementation (using Lua script for atomicity):
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- refill
local elapsed = math.max(0, now - last_refill)
tokens = math.min(capacity, tokens + elapsed * refill_rate)
-- consume
if tokens >= cost then
tokens = tokens - cost
redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
redis.call("EXPIRE", key, 3600)
return 1
else
redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
return 0
end
Pros: handles bursts gracefully, smooth rate over time. Cons: requires Lua script or careful transaction handling. Slightly more Redis state.
3. Leaky bucket
Requests enter a queue. The queue drains at a constant rate. If the queue is full, new requests are rejected.
Conceptually similar to token bucket but enforces a smoother output rate (no bursts allowed on the output side). Often implemented as a sliding window in practice.
Redis with sorted set (timestamps as scores):
def leaky_bucket(key: str, capacity: int, window_seconds: int) -> bool:
now = time.time()
pipe = redis.pipeline()
pipe.zremrangebyscore(key, 0, now - window_seconds)
pipe.zcard(key)
pipe.zadd(key, {str(uuid.uuid4()): now})
pipe.expire(key, window_seconds + 1)
_, current_count, _, _ = pipe.execute()
return current_count < capacity
Each request adds a timestamped entry. Before counting, prune old entries. Pros: smooth rate, accurate to the millisecond. Cons: more Redis memory (one entry per request) and CPU (set operations).
4. Choosing
- Fixed window: internal tools, low-stakes endpoints, throwaway prototypes
- Token bucket: public APIs where bursts are okay (Stripe uses this style)
- Leaky bucket / sliding window: strict rate enforcement, fairness across users, anti-abuse
5. The 429 response
When a client exceeds the limit, return HTTP 429 (Too Many Requests). Include these headers:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717891200
Retry-After: 42
Content-Type: application/json
{"error": "rate_limited", "message": "Too many requests. Retry after 42 seconds."}
X-RateLimit-Reset is the Unix timestamp when the limit resets. Retry-After is seconds to wait. Both are conventions, not standards — emit both.
6. Identifier selection
What you rate-limit by matters as much as the algorithm:
- IP address: easy to spoof, hits NAT'd users (offices) unfairly. Use as a fallback.
- API key: best for authenticated APIs.
- User ID: rate-limits a logged-in user across all their sessions.
- Combined key + endpoint: different limits per endpoint within the same key (search 100/min, write 10/min).
7. Distributed rate limiting
Across multiple API servers, Redis is the natural shared state. One Redis instance can comfortably handle 50k-100k rate-limit checks per second. For higher scale, Redis Cluster or per-shard counters with periodic reconciliation.
Alternative: edge-based rate limiting at the CDN/WAF layer (Cloudflare, AWS WAF). Free for basic rules, decouples from your app servers. Use for IP-based DoS protection. Per-user limits still need to be in your application.
Comparison
| Algorithm | Burst handling | Redis ops/req | Use case |
|---|---|---|---|
| Fixed window | Allows 2x burst at boundaries | 1 (INCR + maybe EXPIRE) | Internal, low-stakes |
| Sliding window log | Strict, no burst | 2-3 (ZADD, ZREMRANGE, ZCARD) | Anti-abuse |
| Sliding window counter | Approximate, low burst | 2 (counts of 2 buckets) | Public API balanced |
| Token bucket | Configurable burst capacity | 1 Lua script | Stripe-style API |
| Leaky bucket | Output rate enforced | Similar to sliding log | Webhook senders |
FAQ
Where should I rate limit: API gateway or app?
Both. CDN/gateway handles IP-based abuse (cheap, fast). Application handles per-user/per-key business logic limits. Don't rely on app-layer to stop DoS.
What about distributed systems with multiple Redis instances?
Pick a sharding key (user ID, IP) and hash to a Redis shard. Each user's rate limit lives on one shard, no coordination needed. Trade-off: hot users hot-shard their Redis.
How do I rate limit logged-out users?
Combine IP + endpoint + UA fingerprint. Be more lenient than for authenticated users (legitimate users aren't yet logged in). Common: 30 requests per minute per IP for unauthenticated endpoints.
Should I burst-allow first-time API users?
Yes, but limit total daily quota. Token bucket with capacity C=50 and rate R=10/min lets a new user immediately try 50 calls, then settle into 10/min. Good DX, still protected.
Need rate limiting designed for your API?
We've built rate limiters for SaaS APIs handling 1M+ requests/day. Token bucket, Redis, Cloudflare integration.
Book a discovery call