Splunk Agentic Ops Hackathon · Observability
Evals tell you how smart an agent is. Warrant tells production how much rope to give it — and takes the rope back. Agents earn revocable, per-action licenses by making falsifiable predictions and being graded by reality, not by a demo that went well.
The industry’s answer to agent safety is “human-in-the-loop” — an analyst approves every action, forever. That isn’t a safety model; it’s a bottleneck with good intentions. And the moment teams get tired of clicking approve, autonomy gets granted the way it always does: a demo went well, a prompt was tweaked, someone flipped auto‑approve. No number. No evidence. No way to take it back.
A driver’s license doesn’t certify that you’re smart. It certifies you were tested on the road you’ll actually drive — and it can be taken away.
Read the full case — and how Warrant compares to evals, guardrails and human-in-the-loop →
Warrant sits between any AI agent and the systems it wants to touch. Before every action it asks one question: does this brain hold a valid license for this action class?
Manufactured incidents, accelerated exams. The agent commits to a falsifiable prediction before each fix; reality grades it. Fifteen graded outcomes in seconds — no cold start, no anecdotes.
How exams work →A Wilson lower bound that clears threshold, a minimum body of evidence, and Brier-scored calibration — an agent that is confidently wrong fails even with a passing hit-rate.
The licensing math →Every license is pinned to model ID + prompt version. Your vendor updates the model overnight? Every license drops to PROVISIONAL before the new brain acts once.
Drift detection →And every earned license is issued as a real, printable certificate — bound to the ledger hash →
Four acts against a live fault-injection sandbox. The kill-shot is Act II: a decoy incident fools the agent exactly the way it would fool an engineer — and its own falsifiable prediction catches the mistake, rolls it back, and suspends the license. Trustworthy when wrong is the property everything else is missing.
Fifteen manufactured incidents across three action classes. Wilson bounds climb until the registry shows three LICENSED stamps.
Acting alone, the agent resolves real incidents — until a decoy mimics a familiar signature. Its fix misses the band it predicted. It declares itself wrong, rolls back, escalates, and loses the license.
The fingerprint changes — no failure has happened yet. Warrant doesn’t wait for one: the track record belonged to a brain that no longer exists.
The new brain re-earns each license under its own fingerprint. Autonomy restored — with evidence, not a vibe.
Warrant consumes the Splunk MCP Server for every read of Splunk data — and ships its own MCP server, so the trust gate becomes infrastructure any agent in the ecosystem can call: a SOAR playbook, Splunk’s own triage agents, a Claude agent.
Built solo for the Splunk Agentic Ops Hackathon 2026 · Observability track.