Operational Troubleshooting

Summary

Operational guide for diagnosing live AtlasBurn issues — guardrail kills, stuck KV flags, missing telemetry, and edge-proxy errors. For SDK install and framework-specific setup questions, see SDK Integration or Proxy Troubleshooting.

First check: dashboard health

Open the AtlasBurn dashboard. If your org shows a red SUSPENDED banner, a guardrail layer has triggered. The banner names the layer and the offending fingerprint.

My traffic is being rejected with 429

The edge proxy is returning 429 because a SUSPENDED flag is present in Cloudflare KV for your org.

  1. Open the dashboard → Forensic Ledger and find the kill event. It will name the triggering layer (Financial Limits, Dumb Loop, Smart Loop, RPS Burst, Premium Concentration).
  2. Fix the root cause in your agent (the ledger shows the offending fingerprint and the full request that tripped detection).
  3. Click Resume to clear the KV flag instantly, or wait for the 2-hour TTL to auto-expire. See Auto-Recovery & Manual Override.

The kill flag won't clear after I clicked Resume

DELETE /api/guardrails/enforce is idempotent and propagates to all Cloudflare edges in under a second. If traffic is still being rejected:

  • Check that the request really is hitting the proxy at proxy.atlasburn.com and not a stale cached baseURL.
  • Confirm the AtlasBurn-API-Key header is set. Requests without it are rejected as unauthenticated, not as guardrail kills.
  • Re-trigger Resume. The dashboard surfaces the KV response — a 404 means the flag is already gone and the issue is elsewhere.

No telemetry events are arriving

  • Using the Edge Proxy? Confirm baseURL points to https://proxy.atlasburn.com/<provider>/v1 and the AtlasBurn-API-Key header is set on every request.
  • Using the SDK? Confirm initAtlasBurnAuto runs at boot (see SDK Integration) and that the process actually has time to flush before exit.
  • Check the Forensic Ledger filter — events default to the last 1 hour.

Guardrail keeps tripping on legitimate traffic

Tune the layer thresholds from Settings → Guardrails. Common cases:

  • Batch jobs — raise the Layer 4 RPS ceiling or schedule the job with a dedicated key whose limits match its burst profile.
  • Identical prompts by design (eval suites, prompt-regression tests) — add the fingerprint to the Layer 2 allowlist.
  • Premium-only workloads — disable Layer 5 for the relevant API key.

Latency increased after enabling the proxy

The edge proxy adds <5ms in steady state (KV read + forward). If you see more:

  • Verify your client is keeping connections alive (keep-alive headers).
  • Check the request is hitting the nearest Cloudflare PoP — not a custom region pinned far from your runtime.
  • Inspect the response cf-ray header to confirm edge termination.

Self-hosted proxy isn't reading KV

Re-check wrangler.toml:

wrangler.tomltoml
[[kv_namespaces]]
binding = "GUARDRAIL_FLAGS"
id = "<your-kv-namespace-id>"

The binding name must be GUARDRAIL_FLAGS — the worker looks up that exact identifier. After editing, redeploy with npx wrangler deploy.

Still stuck?

Open a support ticket from the dashboard with the org id and a sample kill event id — we can replay the full guardrail evaluation against your event stream.