Autonomous Runtime Guardrails

Summary

AtlasBurn's guardrail engine is a server-side, multi-layer defense system that detects and kills runaway agents autonomously — without manual review. It replaces the legacy SDK-level Auto Kill, which only enforced simple financial caps.

Detection runs in the ingestion server. Enforcement runs at the edge via Cloudflare KV with <5ms latency impact.

System maturity: Stable

All five layers are production-ready and active on every paid plan.

Cloudflare KV Architecture

Detection runs server-side in the ingestion pipeline. When any layer trips, the engine writes a global SUSPENDED flag for the organization to the GUARDRAIL_FLAGS Cloudflare KV namespace. Every edge worker reads the flag on every request, so subsequent calls are blocked at the edge in under 5ms — before the upstream provider is ever contacted.

Law 4 — Never crash the proxy (fail open)

If KV is unreachable, the AtlasBurn ingest is down, or any internal dependency fails, the edge worker forwards the request unmodified. We will never take down your AI traffic because of an AtlasBurn outage.

The 5-Layer Defense Engine

Layer 1 — Financial Limits

Hard and soft caps on daily and monthly spend per organization. Soft caps emit a warning to the dashboard and notification channels; hard caps write a SUSPENDED flag to KV and reject subsequent calls at the edge with 429 Budget Exceeded.

Layer 2 — Dumb Loop Detector (Identical Token Fingerprinting)

Computes a deterministic fingerprint of the request body (model + messages + tools) for every call. If 4 consecutive requests share an identical prompt-token fingerprint, the agent is killed. This catches the classic "agent stuck in a while-true loop sending the exact same context" failure mode.

OpenRouter caveat

OpenRouter's load balancer can route the same prompt to different upstream tokenizers, causing fingerprint drift (e.g. 24 vs 48 tokens for the same input). Layer 2 is best-effort on OpenRouter — rely on Layer 4 as the safety net.

Layer 3 — Smart Loop Detector (Context Inflation)

Tracks chat-history token counts per agent session. If counts grow monotonically and linearly — the classic signature of an agent re-appending a repeating JSON-parse error to its context window before each retry — the engine triggers the kill switch before the context window (and the bill) explodes.

Layer 4 — RPS Burst Protection (Frequency Limits)

Per-key request rate limiter. If any API key fires more than 10 requests per minute sustained, the kill flag is set. The window is sliding-minute-based, not bucketed, which eliminates the concurrent database race conditions that plague fixed-window limiters. This is the ultimate safety net when fingerprinting can't help — for example, with OpenRouter where load-balancing defeats exact-match detection.

Layer 5 — Premium Model Concentration

Detects sudden spikes where >90% of spend concentrates on the most expensive models (e.g. Claude 3.5 Opus, GPT-5). This pattern is highly indicative of prompt injection, hijacked routing, or a misconfigured model selector falling back to the most expensive option.

Cloudflare KV Kill Switch Architecture

Detection — the ingestion server (/api/ingest) processes every telemetry event in real time. When a layer triggers, it calls /api/guardrails/enforce.
Flag Push — the enforcement endpoint writes a SUSPENDED flag for the organization to the global GUARDRAIL_FLAGS KV namespace.
Edge Enforcement — every proxy worker reads the flag from KV on every request. Because KV is replicated at the edge, the latency impact is <5ms.
Auto-Recovery (TTL) — flags carry a TTL (default 2 hours). After expiry the system heals itself and the org resumes automatically.
Manual Override — users can clear the flag instantly from the AtlasBurn dashboard.

Provider Coverage

Provider	Layer 2 (Exact Match)	Layer 4 (RPS Burst)
OpenAI	Reliable	Reliable
Anthropic	Reliable	Reliable
OpenRouter	Best-effort (tokenizer drift)	Primary safety net

Ingestion API Endpoints

POST /api/ingest — telemetry collection (SDK and Edge Proxy both post here)
POST /api/guardrails/enforce — internal webhook that writes kill flags to Cloudflare KV

Next Steps

AtlasBurn Edge Proxy — the enforcement layer
Advanced Concepts — Monte Carlo forecasting and retry cascades