A dialectical AI framework that turns hallucinations into schema violations — not mysterious behavior. Built with structured paranoia and adversarial thinking.
Traditional AI security tools have a fatal flaw: they can confidently fabricate evidence. When you deploy a single LLM to analyze security threats, it doesn't just make mistakes — it makes them with conviction.
In cybersecurity, a hallucinated threat assessment isn't just wrong. It's dangerous. A false positive wastes resources. A false negative lets an attacker walk through the front door. And the model gives no signal that it's making things up.
ARES was born from a single question: What if we could make hallucinations physically impossible?
We built a multi-agent debate system expecting the truth to emerge from structured argument. Instead, we discovered something the AI research community is only beginning to understand.
When pushed back by the opposing agent, the Architect systematically retreated — dropping confidence by an average of 30 points per round. Even when its initial threat assessment was perfectly correct, it erased its own answers to appease the challenger. Like a smart student next to a bully.
The Skeptic became entirely rigid. Assigned the role of challenger, it simply crossed its arms and said no — refusing to update its stance regardless of counter-evidence. When given explicit calibration prompts, it ignored them completely.
"LLM agents do not negotiate toward truth. They perform social behaviors that mimic negotiation — which includes capitulation, rigidity, and over-correction."
This finding was independently corroborated by researchers at ETH Zurich in their paper "Can AI Agents Agree?"
The problem is inside the black box. The solution is entirely outside of it. ARES treats the LLM as a chaotic, flawed reasoning engine and places it inside a strict, deterministic cage.
Identifies anomaly patterns aligned to MITRE ATT&CK. Generates grounded assertions — every claim must cite a fact_id from the frozen evidence. Cannot invent evidence.
Challenges every threat hypothesis by constructing benign explanations from the same evidence. Identifies maintenance windows, admin activity, scheduled tasks. Cannot introduce external knowledge.
Split into two: the Judge (pure math, no LLM) computes the verdict deterministically. The Narrator (constrained LLM) explains it but cannot modify it. A mathematical judge cannot be tricked by rhetoric.
The preprint documenting our core discovery: why multi-agent debate degrades accuracy, and how deterministic scaffolding solves it. Scroll through below or download the PDF.
Your browser doesn't support embedded PDFs.
ARES doesn't try to prevent AI from hallucinating. Instead, it makes hallucinations mechanically impossible by converting them into catchable validation errors.
Every agent is bound to a cryptographically frozen Evidence Packet. All assertions must reference a fact_id that exists in this packet. A deterministic Coordinator — the "Bouncer" — rejects any message containing non-existent references. An AI hallucination is no longer mysterious behavior. It's contempt of court.
Architecture visualization candidates generated during the research process. Each diagram captures a different perspective of the ARES pipeline. Click to expand.
Scroll horizontally to explore all candidates →
ARES is modeled after the biological immune system — specifically, the mechanisms that prevent autoimmune overreaction.
Antigens
Facts in EvidencePacket
T-Helper cells
Architect (identifies threats)
Regulatory T-cells
Skeptic (prevents overreaction)
T-Killer cells
Coordinator (enforces, terminates)
MHC restriction
Packet binding (respond only to bound evidence)
Autoimmune prevention
Closed-world principle (can't attack self)
"The Builder lives with Ankylosing Spondylitis — an autoimmune disease where the immune system attacks the spine. ARES was born from the question: what if we could build the failsafe that biology couldn't?"
Multi-turn debate degrades accuracy in ALL configurations tested. Zero good flips occurred. The debate chapter is formally closed.
9 failures classified: 4 confidence calibration (44%), 3 evidence gaps (33%), 2 ambiguity mismatches (22%). Every failure has a fix path.
Build the logic, the math, and the failsafes first (the Iron Skeleton), then drop the LLM brains into that highly restricted cage.
Foundational architecture documents: dialectical reasoning cycle, five attack scenarios, ethical framework aligned to NIST AI RMF. External critique identified three missing prerequisites before any code should be written: data schemas, agent I/O contracts, and a testing framework.
All documentation submitted to Claude for independent assessment. Verdict: "comprehensive, thoughtful, and architecturally sound." Key gaps identified: frozen data structures, agent I/O contracts, API specs. The build-in-public journey begins.
Graph Schema (6 node types, 7 edge types, 110 tests). EvidencePacket & DialecticalMessage protocol (292 tests). Agent Foundation with three hard invariants: packet binding, phase enforcement, evidence tracking. Concrete agents: Architect, Skeptic, OracleJudge, OracleNarrator. Cumulative: 570 tests passing.
First sensor layer: Windows Security Event XML parser (Event IDs 4624, 4672, 4688). Three golden pipeline scenarios validated end-to-end from raw XML to verdict. 130 new tests. Cumulative: 700.
DialecticalOrchestrator: single run_cycle(packet) call automating the full THESIS → ANTITHESIS → SYNTHESIS pipeline. Tamper-evident Memory Stream with SHA256 hash-chained audit log. Pre-session review caught a critical bug: content hash must cover the full CycleResult, not a subset. Cumulative: 861 tests.
Strategy Pattern enabling rule-based and LLM-backed implementations to swap without changing agent interfaces. Closed-world validation silently filters any LLM-cited fact_id that doesn't exist in the EvidencePacket. First live LLM cycle: zero validation errors. Architect confidence 0.90 (vs 0.49 rule-based). Cost: $0.03 per cycle. Cumulative: 1,104 tests.
12-scenario gauntlet across four difficulty tiers. LLM accuracy: 91.7% on the initial 12 scenarios (up from 50% rule-based). Benchmark runner hardened with per-scenario error isolation and real cost tracking. Cumulative: 1,190 tests.
Multi-turn debate experiment: accuracy dropped from 91.7% to 83.3%. Zero "good flips," 25% "bad flips." Agents re-analyzed the same packet from scratch each round; the termination condition (NO_NEW_EVIDENCE) fired correctly after round 2. SC-012 (Supply Chain) regressed due to confidence inflation without new reasoning. The multi-turn debate chapter is formally closed.
Battle Plan and Compendium submitted to GPT-5.4 Pro, Gemini 3.1 Pro, and Perplexity for independent review. Unanimous consensus: ship single-turn as the production path. The failure mode is architectural (asymmetric calibration), not fixable by prompting. Independently corroborated by ETH Zurich's "Can AI Agents Agree?" paper.
Syslog extractor (8 message types: SSH, firewall, sudo, systemd) and NetFlow extractor (8 flow types, 14 facts per record). Three independent telemetry sources feeding richer cross-source evidence to the dialectical agents. Cumulative: ~1,488 tests.
Built confidence-band escalation gate at [0.35, 0.70]. Critical finding: all 7 actual errors are MISCALIBRATED — the system is confidently wrong, not uncertainly wrong. The gate treats the wrong disease. Pivot to miscalibration detection via per-claim evidence audit. Cumulative: 1,736 tests.
33-scenario corpus regenerated at 72.7% baseline. OracleJudgeV2 (delta-based scoring), v3 prompts (exhaustive fact citation), and threshold sweep. V4 prompt calibration confirmed the Architect hits a 0.75 confidence floor regardless of instructions — a structural property of LLM confidence quantization. Final trajectory: 50% → 91.7% (12 scenarios) → 72.7% (33 scenarios) → 81.8% (v3 prompts) → 87.9% (V2 Oracle, best config).
WebSocket event emitter and 3D evidence graph. Corpus replay runner validated event sequences across all 33 scenarios deterministically. 1,948 tests passing with zero regressions.
Benchmark replay pipeline consuming real LLM data. Standalone HTML/Three.js visualizer rendering evidence facts as particle clusters with citation lines and live confidence bars. Strategic pivot: nw_wrld abandoned in favor of direct WebSocket rendering for full domain control. Final test count: 1,927 passing, 65 skipped, 0 failures.
Three open tracks: harden single-turn accuracy past 90%, publish the asymmetric calibration finding as a formal research contribution, and expand ARES VISION into a live operational interface.