Auto-Recovery & Manual Override

When a guardrail trips, you have two paths back online

Every kill flag carries a TTL so the platform heals itself, and the dashboard provides a one-click manual override for when you need traffic restored immediately.

Auto-Recovery (TTL expiry)

When a guardrail layer triggers, the ingestion server writes a SUSPENDED flag for the affected organization to the global GUARDRAIL_FLAGS Cloudflare KV namespace with a 2-hour TTL (configurable per plan).

  • Every edge worker reads the flag on every request — no polling, no propagation delay.
  • When the TTL expires, KV removes the key automatically and traffic resumes globally.
  • No engineering action is required for the system to come back online.

Auto-recovery is a safety net, not a strategy

If a flag fires twice in 24 hours, fix the root cause. The TTL is designed to absorb a single incident, not to mask a misbehaving agent in production.

Manual Override (instant resume)

When you have already diagnosed the issue and want traffic back immediately, hit Resume in the AtlasBurn dashboard. This fires a DELETE to the enforcement endpoint, which removes the flag from Cloudflare KV.

Manual override under the hoodhttp
DELETE /api/guardrails/enforce
Authorization: Bearer <dashboard-session-token>
Content-Type: application/json

{
  "org_id": "org_xxx",
  "reason": "manual_resume"
}

Because the worker reads KV on every request, the next call after the DELETE sees no flag and forwards normally. Global propagation is sub-second.

Override via toggle configuration

For programmatic recovery (CI hooks, on-call runbooks), the same effect is achieved by flipping the org's enforcement toggle in the dashboard. The toggle calls the same /api/guardrails/enforce endpoint and clears KV identically.

Audit trail

Every kill, auto-recovery, and manual override is logged to the Forensic Ledger with actor, timestamp, triggering layer, and the request fingerprint that caused the kill.

Next steps