Why DirtFleet's rate limiter is 40 lines — and what we left for later — DirtFleet blog

DirtFleet's public API runs at 60 requests per minute per API key, with the budget surfaced in real time via X-RateLimit-* headers on every response. The limiter is small (40 lines), in-memory, per-process — and deliberately so. Here's the thinking, the tradeoffs, and what we'll change when we scale past the current shape.

The shape

Sliding-window counter, bucketed by API key id. Each bucket is an array of timestamps; on each request we drop stamps older than the window, count the rest, and either admit + push or deny + compute Retry-After. No Redis, no downstream call. Single function, ~30 lines including the sweep helper for periodic GC.

The same primitive (createRateLimiter in lib/rate-limiter.ts) backs the login flow, signup flow, password reset, org archive download, VIN decode, and a handful of others. Bucketing key differs per flow (email, user id, IP, org id, API key id) but the math is identical.

What we deliberately don't do

Token bucket. Smoother and burst-tolerant, but harder to reason about in a debugger and harder for integrators to grok in headers. Sliding window has visible semantics — you know exactly when the window resets.
Redis or Upstash from day one. Behind a single web container behind one Postgres, in-memory is correct: the counts live with the process that handles the requests. When we horizontally scale to N replicas, the adversary can multiply by N — at that point we swap tolib/rate-limiter-redis.ts (drop-in, same shape, already written). Premature distributed coordination is worse than no coordination.
Per-endpoint caps. One bucket per key, all paths share. The alternative — separate caps for reads vs writes vs exports — is a rich source of inconsistent behavior ("why does GET /assets 429 but POST /hours doesn't") without buying much. Customers want predictable budget; one number per key is what they want to think about.
Burst quotas. No "you can burst to 120 but sustained 60" — the model is just "60 in any rolling 60s." Burst quotas exist because token buckets accidentally allow them; sliding windows don't. The deny is cleaner for everyone.

The headers

Every /api/v1/* response carries:

X-RateLimit-Limit — the per-key cap (60).
X-RateLimit-Remaining — calls left in the current window.
X-RateLimit-Reset — unix epoch seconds when the window resets.
On 429: Retry-After — recommended back-off in seconds. Body is { ok: false, error: "rate_limited", retryAfterSec: N }.

Header names match the de-facto industry convention (GitHub and Stripe ship the same names) so existing client libraries Just Work. Receivers reading the response can show a real countdown to a user; CI runners can sleep precisely; misbehaving cron loops self-correct without our intervention.

What 429 reveals

Three classes of integrator hit the cap, and they want different responses:

Misbehaving polling loop. Someone polling GET /assets every second "just to see if anything changed." The right fix is webhooks for them, which we tell them in the response body and the docs. The 429 is the prompt to switch.
Burst sync. An integration syncing 600 assets at startup. Honest use, not abusive. TheRetry-After tells them to pace; total time gets longer but doesn't fail.
Genuine high-throughput need. A dispatcher dashboard refreshing every 10 seconds across 50 fleets. These get higher caps commercially — Professional and Enterprise tiers negotiate the limit as part of the contract. We don't shape it via tags; we shape it via per-key overrides (planned, not yet shipped — current implementation is one cap for all customers).

What ships next

Per-key override. A nullable rateLimitPerMinute column on the ApiKey row. Falls back to the global 60 when null. Lets enterprise contracts negotiate higher caps without a code change.
Redis backend. When we cut over to horizontal scaling. The drop-in already exists; the activation is environment-flag-flip + cutover.
Per-endpoint inspection. A debug header (X-RateLimit-Path-Counts?) showing the breakdown of recent calls by path. Not on by default — it adds latency on every response — but on for keys with debugging mode.

Build the simplest thing that ships honest budget info on every response. Plumbing for the next iteration is fine to leave un-written until the iteration shows up. The headers themselves — same names everyone else uses — are the part that matters; the storage backend is an implementation detail.

→ API reference · → Receiver examples · → Idempotency keys