Per-Tenant Rate Limiting in Multi-Tenant SaaS: A 6-Layer Approach

In a multi-tenant platform, one misbehaving tenant can degrade the experience for everyone. We built 6 layers of rate limiting to prevent this. Here's how each layer works and why you need all of them.

Layer 1: Per-API-Key (Requests per Minute)

The first line of defense. Each API key has a per-minute request limit based on the tenant's plan. Starter keys get 60/min, Pro gets 300/min, Enterprise gets 1,000/min. This prevents a single integration from flooding the system with requests.

Layer 2: Per-Tenant Monthly Aggregate

Even if individual keys stay within limits, a tenant with many keys could collectively overwhelm the platform. Layer 2 tracks total monthly API calls across all of a tenant's keys. When the monthly quota is reached, all keys for that tenant receive 429 responses.

Layer 3: Per-Endpoint Limits

Not all endpoints are equal. Write operations (activations, usage recording) are more expensive than reads (license validation). Layer 3 applies endpoint-specific limits: validation gets 1,000/min, but activation creation gets 100/min.

Layer 4: Usage-Based Enforcement

Integrated with the entitlement system, Layer 4 hard-stops tenants that exceed their plan's usage limits. This isn't just rate limiting — it's business logic enforcement. When you hit 100% of your monthly quota, the API returns a clear error explaining what happened and how to resolve it (upgrade or wait for the next billing cycle).

Layer 5: Developer/Test Mode

Test mode applies 10x relaxed limits with completely separate counters. This means developers can hammer the API during integration testing without affecting their production quotas. Test mode is detected via ephemeral API keys or the X-DevTools-Mode header.

Layer 6: Webhook Outbound

Rate limiting isn't just for inbound requests. Our webhook delivery system limits outbound deliveries to 100/min per tenant. If a webhook endpoint starts failing, a circuit breaker activates after 10 consecutive failures, pausing delivery for 30 minutes to prevent cascading issues.

Implementation: Redis Sliding Windows

All layers use Redis-backed sliding window counters. Each layer has its own key namespace (rl:key:, rl:tenant:, rl:endpoint:), and every 429 response includes standard headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After.