What is a Rate Limiter?
A rate limiter is a mechanism designed to control the frequency of requests or actions in a system. It ensures that a service doesn’t get overwhelmed by too many requests in a short period, protecting it from overuse or abuse. Imagine a bucket that holds a limited number of tokens - each request takes one, and if the bucket runs dry, further requests are delayed or denied until it refills. Rate limiters are widely used for:
- Preventing Overload: Keeping servers stable under high traffic.
- Security: Mitigating denial-of-service (DoS) attacks by capping request rates.
- Fair Usage: Ensuring equitable resource access among users or applications.
You might use a rate limiter in APIs, web services, or microservices to maintain performance and reliability. For a broad overview, see Rate Limiting on Wikipedia. I also drew inspiration from practical examples like Rate Limiters at Stripe and How a Rate Limiter Took Down GitHub, which highlight real-world applications and pitfalls.
Circuit Breaker
While rate limiters control request rates, circuit breakers handle failure resilience. A circuit breaker monitors a service for failures (e.g., timeouts, errors) and “trips” to block requests when a threshold is crossed, preventing cascading failures across a system. Once the service recovers, it "closes" to resume normal operation. Think of it as a safety switch in your home’s electrical system - cutting power to avoid overload.
They’re critical in distributed systems where one failing component can drag down others. I explored this concept alongside rate limiting because both contribute to system resilience, a topic well-covered in tools like Resilience4j, which I investigated during my experiments.
Concurrency Limiter
Before diving into my solutions, I also tackled a concurrency limiter. Unlike a rate limiter (which caps requests over time), a concurrency limiter restricts the number of simultaneous active requests. It’s perfect for scenarios where a service can only handle a limited number of concurrent resource-heavy operations. I implemented this alongside my rate limiter because both address resilience: one prevents overload over time, the other in the moment.
Why a Hands-On Approach?
I decided to build a rate limiter from scratch because abstract discussions only get you so far. Coding a proof of concept (PoC) forces you to wrestle with real-world details - headers, concurrency, exceptions - that don’t surface in theory. It’s the best way to clarify technical expectations and refine requirements. Plus, I wanted to explore how rate limiting ties into broader resilience strategies, like circuit breaking, in a tangible way. Resources like An Alternative Approach to Rate Limiting and Rate Limiting in Node.js fueled my curiosity to experiment hands-on.
Implementation Challenges: Going Distributed
A rate limiter in a single app is straightforward - you can track requests in memory with a counter. But in a distributed system, where multiple apps or instances serve requests, that approach falls apart. Each app might have its own count, leading to inconsistent limits. For true rate limiting, you need a centralized system that all instances can query, ensuring a unified view of request rates.
This is where distributed design becomes critical. I found Understanding Rate Limiting (YouTube) incredibly helpful for grasping these challenges. It emphasizes the need for a shared store - like Redis - to synchronize limits across services, a strategy also detailed in Scaling GitHub’s API with a Sharded, Replicated Rate Limiter in Redis, which I referenced for inspiration.
My Solutions
To tackle these challenges, I leaned heavily on Redis for its speed and atomicity, guided by Redis’s rate limiting patterns. Here’s how I approached it:
- Centralized Storage: I stored all rate-limiting state in Redis, making it accessible to any app in the system. This ensures consistent limits across distributed instances.
- Scripts in Redis: I embedded the core logic (e.g., token bucket refills, counter updates) in Redis Lua scripts. This minimizes network calls - everything happens in one atomic operation on the server side. For example, my token bucket checks tokens, updates counts, and sets expiration in a single script.
- Concurrency Limiter: I implemented this with a simple counter (decremented when requests end) and a timestamp set for cleanup, all managed in Redis with termination middleware to avoid leaks.
- Refill Strategy: I opted for an application-scheduled Redis script (
schedulePeriodic
) over periodicmset
calls or Redis-side cron jobs. This gave me control over errors, visibility into timing (viagetNextTimer
), and atomic execution without excessive network overhead.
All scripts and configs are in my rate-limiter repo. The trade-off? Managing Lua scripts adds complexity, but their simplicity - akin to Simple Rate Limiting Algorithm Gist - keeps bugs at bay, and the performance gains are worth it.
Rate Limiter Algorithm Descriptions
Token Bucket
This algorithm maintains a fixed number of tokens (maxRequests) in a bucket that refills at a constant rate. Each request consumes one token, and if tokens remain (remaining >= 0), the request is allowed. The bucket is refilled by a Lua script executed periodically by the app. Learn more about its mechanics at Token Bucket Algorithm.
Timestamp Token Bucket
An enhanced token bucket that uses timestamps to calculate token replenishment dynamically. It tracks the last request time and adds tokens based on elapsed time relative to the window size, capping at maxRequests. Requests are allowed if tokens remain after consumption. In the current implementation, the user starts accruing tokens from the first request.
Fractional Token Bucket
A variation of the token bucket that divides the maxRequests into smaller fractional tokens (e.g., 1/60th). Tokens replenish incrementally over time based on elapsed time, allowing finer granularity. Requests are permitted if fractional tokens are available.
Leaky Bucket
This algorithm treats requests as water flowing into a bucket that leaks at a constant rate (maxRequests/windowSize). It tracks the number of requests (count) and removes them over time. If the bucket isn’t full (count <= maxRequests), the request is allowed.
Fixed Window Counter
This divides time into fixed windows (e.g., seconds) and tracks requests per window using a concatenated key. Each request decrements the counter, and the window resets after expiration. Requests are allowed if the counter hasn’t hit zero.
Sliding Window Log
This logs each request’s timestamp in a Redis sorted set and removes entries outside the sliding window. It counts requests within the window, allowing new ones if the total is below maxRequests. The log expires after the window duration.
Sliding Window Counter
This approximates a sliding window by dividing it into sub-windows (e.g., 1/60th of windowSize) and tracking request counts per sub-window. Old sub-windows are pruned, and requests are allowed if the total count across the window is below maxRequests.
Comparison table
Algorithm | Granularity | Storage Method | Time Handling |
---|---|---|---|
Token Bucket | Coarse | Single key (SETEX) | Fixed expiration |
Timestamp Token Bucket | Medium | Hash (HSET) | Timestamp-based |
Fractional Token Bucket | Fine | Hash (HSET) | Timestamp-based |
Leaky Bucket | Medium | Hash (HSET) | Timestamp-based |
Fixed Window Counter | Coarse | Concatenated key (SETEX) | Window index |
Sliding Window Log | Fine | Sorted set (ZADD) | Timestamp-based |
Sliding Window Counter | Medium | Hash (HSET) | Sub-window index |
Wrapping Up: It’s All About Resilience
This project was about more than just rate limiting - it’s about building resilient systems. Rate limiters prevent overload, concurrency limiters manage real-time capacity, and circuit breakers handle failures. By implementing these hands-on, I learned how they interplay and how distributed systems demand careful design. Check out the code at my rate-limiter repo and try it yourself.