50 iterations — what shipped, what we skipped, what's next — DirtFleet blog

Fifty iterations. From one endpoint to a public API with 33+ routes, two machine-readable specs (OpenAPI + AsyncAPI), three feed formats (RSS + Atom + JSON), 24 blog posts, 60+ marketing pages, and 787 passing tests. Here's the inventory, the non-choices, and what the cadence taught us.

The inventory

Roughly grouped — exact counts as of this post:

API surface (~33 v1 routes): /me, /me/usage, /version, /changelog, /health; /assets + /assets/batch + /assets/[id]; /flags + /flags/[id]; /hours + /hours/batch + /hours/[id]; /work-orders + /work-orders/[id]; /tools + /tools/batch + /tools/[id]; /projects + /projects/batch + /projects/[id]; /repairs + /repairs/[id]; /yards + /yards/batch + /yards/[id]; /webhooks + /webhooks/[id] + /webhooks/[id]/test + /webhooks/[id]/deliveries.
Machine specs: OpenAPI 3.1 at /openapi.yaml, AsyncAPI 2.6 at /asyncapi.yaml, Postman 2.1 collection at /postman_collection.json.
Feeds: RSS 2.0 / Atom 1.0 / JSON Feed 1.1 for both /blog and /changelog (where applicable).
Integration landing pages: /integrations, /directory, /zapier, /slack.
Doc pages: /docs/api, /example, /openapi, /sdk, /webhooks, /recipes.
Tests: 787 across 112 files. Mostly contract tests next to the routes they protect; a smaller integration tier hits real Postgres for the bugs unit tests structurally miss.

The five patterns that hardened

Covered in detail in the 37-iteration design retrospective — the short version:

{ ok } envelope on every response, stable error codes.
Cursor pagination, never offset.
Cross-tenant 404, never 403.
Forgiving on enums (unknown values drop to undefined), strict on required fields.
Idempotency-Key header (or clientMutationId body) on every write endpoint that creates rows. Partial-success arrays on every /batch endpoint.

The non-choices, gathered

We've documented each of these elsewhere; the consolidated list is useful as a single page of context.

No GraphQL. REST + cursor pagination is simpler. Reconsider when integrators ask for it.
No published SDK package. openapi-typescript / oapi-codegen / etc beats a versioning trap.
No token-bucket rate limiter. Sliding window has visible semantics; no burst quotas hidden in the algorithm.
No per-endpoint rate caps. One number per key. Predictable budget > clever shaping.
No Postgres row-level security. organizationId on every row, every lib function takes orgId first, every detail endpoint 404s on cross-tenant ids. RLS revisits at 1000-tenant scale.
No GraphQL-style field selection. Most clients want every column; the cache story gets messy otherwise.
No browser end-to-end tests. Contract + DB-integration cover what we need. Add Playwright when a regression makes the case.
No snapshot tests. They become "update the snapshot" PRs that ratchet through without anyone reading the diff. Explicit assertions only.
No webhook secret rotation in PATCH. Rotate by delete + recreate. Lets receivers move URLs without losing their Slack signing key. Operationally easier than the alternatives.

What the cadence taught us

50 iterations at ~5 minutes per iteration is 4 hours of work, split into chunks small enough that each can fully land. The constraint shaped what shipped:

Schema-first, batched. Every iteration that touched DB started with the Prisma migration. The schema changes are committed before the routes; the routes are committed before the tests. Each layer settles before the next one builds on it.
One vertical slice per iteration. Schema + lib + route + tests + docs in one commit. No half-finished features. The pattern means even an iteration interrupted at 5 minutes leaves the codebase coherent.
Abstractions emerge from the third copy. ThestartApiRequest helper (iter 27) absorbed CORS (iter 30), rate-limit headers (iter 27), per-key attribution (iter 23) with zero per-route changes. We didn't design it on day one — we factored it when the third copy of the auth dance felt obviously wrong.
Documenting non-choices pays interest. Every big section of the codebase has a comment explaining why we didn't do the obvious alternative. That's what makes future-us comfortable changing the call. Every time we've revisited a non-choice (RLS, token-bucket, GraphQL) the comment was the load-bearing context that settled the discussion in five minutes.

What's next

The surface is comprehensive enough to do real integration work. The next iterations probably aren't more API endpoints — they're the things you can't check off a feature list:

Real customer integrations — Zapier-style + dedicated. The /integrations/zapier and /integrations/slack pages are templates; the real value is the first 3-5 shipped integrations that turn the templates into Stripe metering.
Per-key rate-limit overrides — a nullable column on the ApiKey row. Enterprise contracts negotiate caps without a code change. Already designed; ships on first ask.
Redis-backed limiter on horizontal scale. Drop-in replacement at lib/rate-limiter-redis.ts; environment-flag flip + cutover when needed.
SOC 2 prep — most of the work is documentation + evidence collection, not code change. The /trust hub, /security posture, /sla, audit log, and security.txt are the foundation.

Build smallest things that ship correctly, every iteration. Document what you didn't do. Factor the third copy. Cross-tenant 404, never 403.

→ Full changelog · → End-to-end recipes · → Start free trial