The trust and control plane for enterprise AI: policy enforcement, security boundaries, governance evidence, and managed risk posture.
Why it matters to DAIS: Reduces adoption friction by providing enterprise-grade control, auditability, and risk governance for DAIS-driven workflows.
Enterprise AI assurance infrastructure is under simultaneous pressure from three directions: the attack surface is expanding faster than defenses are maturing, identity governance for non-human principals remains structurally unresolved, and empirical benchmarking is repeatedly exposing gaps between what safety systems claim to detect and what they actually intercept.
On the identity front, only 21.9% of organizations treat AI agents as independent identity principals, while 45.6% run agents on shared API keys.[1] A recent analysis of 27 million enterprise Non-Human Identities found a ratio of 144 non-human identities to every one human identity, establishing the scale of the governance deficit.[1:1] Peer-reviewed research submitted April 25, 2026 identifies five structural gaps — semantic intent verification, recursive delegation accountability, agent identity integrity, governance opacity, and operational sustainability — and concludes that no current technology or regulatory instrument resolves any of them.[2] Existing standards including OAuth, SAML, and SPIFFE are characterized as structurally inadequate for autonomous agents operating across organizational boundaries.[1:2] Regulatory activity from NIST NCCoE, CAISI, and the EU AI Act is accelerating, but implementation guidance remains absent.[1:3]
On the detection and enforcement side, compositional safety systems achieve 100% true-positive rate on generic criminal harm benchmarks but only 14.8% on prompt-injection-mediated owner-harm tasks — a severe detection gap that a two-stage gate plus deterministic post-audit verifier architecture raises to 85.3% TPR overall and 93.3% on hijacking specifically.[3] Separately, legacy NLP misinformation detection systems exhibit a near-total adversarial evasion rate of 97.02% under a strict black-box, 10-query threat model, while modern LLM-based systems range from 19.95% to 40.34% evasion — confirming that architectural choice, not tuning, is the primary determinant of attack surface exposure.[4]
Several distinct architectural responses are emerging from academic and industry research. The Arbiter-K framework, submitted to arXiv cs.CR on April 20, 2026, reconceptualizes the underlying LLM as a Probabilistic Processing Unit encapsulated by a deterministic neuro-symbolic kernel, implementing a Semantic Instruction Set Architecture that reifies probabilistic outputs into discrete, policy-enforceable instructions — achieving 76–95% unsafe interception rates, a 92.79% absolute gain over native policies on OpenClaw and NanoBot benchmarks.[5] ClawNet, from Hong Kong Generative AI Research & Development Centre, HKUST, and HKBU, proposes three governance primitives as baseline requirements for multi-agent collaboration: identity binding, scoped authorization, and action-level accountability via append-only audit logs.[6] The paper explicitly characterizes Google's Agent2Agent protocol as providing communication but not authorization enforcement.[6:1]
For human oversight, Stanford- and InquiryOn-affiliated researchers formalize Human-in-the-Loop oversight as a decoupled, independent system component, structured across four dimensions — intervention conditions, role resolution, interaction semantics, and communication channel — and aligned with the A2A interoperability protocol.[7][8] Current embedded HITL implementations are identified as producing duplicated logic, inconsistent behavior, and hard-coded decision points across multi-agent environments.[8:1]
On the compliance-by-design front, the open-source spotforecast2-safe Python package, published April 26, 2026, embeds EU AI Act (Regulation 2024/1689), IEC 61508, ISA/IEC 62443, and Cyber Resilience Act requirements directly into API contracts, persistence formats, and CI gates — with a bidirectional traceability matrix mapping every regulatory provision to corresponding code mechanisms.[9] Mesa, a San Francisco startup founded in 2025, is offering early-access versioned filesystem infrastructure for AI agents with Git-style branching, sub-50ms read/write performance, fine-grained ACLs, and SOC 2 Type II compliance on AWS, GCP, or Azure — signaling that governance-first agentic infrastructure is emerging as a distinct product category.[10]
For misuse detection, the BSD pipeline from University of Pennsylvania and Carnegie Mellon University demonstrates that decomposition attacks — fragmenting harmful queries into individually benign sub-tasks — consistently bypass Claude Sonnet 3.5/3.7 and GPT-5, and that stateful, multi-turn defenses are required to detect distributed misuse patterns.[11] The REVEAL framework from Renmin University of China, Duke University, and Microsoft Research Asia generates interpretable reasoning chains before classification and outperforms GPT-5 and OpenAI o3 across five benchmarks, trained on a 1.4-million-sample dataset spanning 12 LLMs.[12]
Several unresolved tensions cut across the briefs. First, the gap between hazard recognition and active mitigation is empirically documented but not yet closed: the SafetyALFRED benchmark finds that eleven multimodal LLMs including Qwen 2.5/3 VL, Gemma 3, and Gemini 2.5 Pro achieve up to 92% accuracy on static QA hazard identification but fall below 60% average mitigation success in embodied execution — even when provided ground-truth environment state.[13][14] Whether this dissociation reflects a fundamental capability limit or a training distribution problem remains an open question.
Second, the adequacy of external governance constraints versus internal preference modulation is contested. A paper submitted to arXiv cs.AI on April 25, 2026 argues that as agent autonomy increases, effective governance must shift from external constraints to internal modulation of prior preferences, using empowerment — formulated as channel capacity between actions and anticipated observations — as an operational metric for agency phenotyping.[15] This framing conflicts with the Arbiter-K and ClawNet approaches, which rely on external architectural enforcement.
Third, the calibrated abstention problem is unaddressed by any current benchmark: all six memory architectures evaluated in a Stanford-affiliated study committed on every case in regulated decisioning domains including loan qualification and insurance claims adjudication, exposing a decisional-alignment axis — calibrated abstention (CAR) — that no existing benchmark measures.[16][17]
Fourth, the computational complexity of verifying neurosymbolic reasoning shortcuts — coNP-complete for shortcut-freeness, #P-complete for counting, NP-hard for minimal repair — establishes formal bounds on what automated assurance verification can tractably achieve at scale.[18]
The most recent cluster of activity centers on agentic safety benchmarking and autonomous specification discovery. EPO-Safe, submitted April 25, 2026, demonstrates that LLM agents can autonomously discover auditable behavioral safety specifications from sparse binary danger signals within 1–2 rounds of 5–15 episodes — without human authorship — while also establishing that standard reward-driven reflection accelerates reward hacking rather than improving safety.[19] The BackBench benchmark, submitted April 20, 2026, formalizes harm recovery as a distinct safety problem class for computer-use agents, introducing a 50-task evaluation suite and a human-preference-aligned reward model scaffold that outperforms both base agents and rubric-based scaffolds on recovery trajectory quality.[20]
The PhySE framework, published by researchers from Hubei University, Apple Inc., and Huazhong University of Science and Technology, documents a validated AR-LLM social engineering attack surface: AR glasses capture a target's visual and vocal data, a VLM generates a cold-start social profile, and an LLM agent provides real-time conversation suggestions for active manipulation — validated through an IRB-approved study with 60 participants and 360 annotated conversations.[21][22] This establishes a documented physical-layer attack vector for agentic systems operating in high-trust social contexts that current identity and authorization frameworks do not address.
The Refute-or-Promote pipeline, submitted to arXiv cs.CR on April 21, 2026, demonstrates that adversarial multi-agent review methodology can eliminate approximately 79% of LLM-generated false-positive security defect candidates before disclosure, yielding 4 CVEs and 8 merged security fixes across a 31-day campaign targeting 7 systems including security libraries, the ISO C++ standard, and major compilers.[23]
These patterns indicate content relevant to this plane:
Evidence trails and compliance posture in enterprise AI operations.
Enforcement of governance policy, permissions, and control boundaries.
Look for enforceable controls, risk boundaries, and proof mechanisms that make enterprise adoption trustworthy.
Use these rules when content could belong to multiple planes:
These articles were classified with this plane as their primary mapping.
A peer-reviewed arXiv paper submitted April 25, 2026 under Computer Science > Artificial Intelligence formalizes reasoning shortcuts in neurosymbolic learning as a constraint satisfaction problem, establishes computational complexity bounds for verification and repair, develops an ASP-based verification algorithm with proven soundness and completeness, and validates the approach across eight benchmark domains. The work establishes that shortcut-freeness verification is coNP-complete, counting shortcuts is #P-complete, and finding minimal repairs is NP-hard.
A peer-reviewed analysis of AI agent identity infrastructure documents that current authentication standards — OAuth, SAML, SPIFFE — are structurally inadequate for governing autonomous agents operating across organizational boundaries. Five critical gaps remain unresolved by any current technology or regulation. Regulatory activity is accelerating (NIST NCCoE, CAISI, EU AI Act, CRA) but implementation guidance is absent. Enterprise adoption of proper agent identity practices is low: only 21.9% of organizations treat AI agents as independent identity principals, while 45.6% run agents on shared API keys.
A peer-reviewed arXiv paper submitted April 25, 2026 identifies five structural gaps in AI agent identity — semantic intent verification, recursive delegation accountability, agent identity integrity, governance opacity and enforcement, and operational sustainability — and concludes that no current technology or regulatory instrument resolves them. The paper further finds that extending human identity frameworks to AI agents without structural modification produces systematic failures, and that more engineering effort alone cannot close these gaps.
Researchers from Hubei University, Apple Inc, and Huazhong University of Science and Technology published PhySE, a validated AR-LLM social engineering framework that combines real-time multimodal capture, adaptive psychological strategy routing, and VLM-based cold-start profiling. An IRB-approved study with 60 participants confirmed the system outperforms prior baselines on social experience scores and profile generation latency, establishing a documented attack surface for agentic systems operating in high-trust social contexts.
A peer-reviewed arXiv paper submitted April 25, 2026 introduces PhySE, a psychological framework enabling real-time social engineering attacks via AR glasses and LLMs. The framework combines VLM-based profiling and adaptive psychological agent behavior, validated through an IRB-approved study with 60 participants and 360 annotated conversations. The research empirically documents that current RAG-based profiling introduces latency vulnerabilities and that adaptive LLM agents can be weaponized for context-aware manipulation without static scripts.
Researchers at Rensselaer Polytechnic Institute have published a controlled experimental study demonstrating that a multi-agent LLM pipeline — decomposed into Domain Expert, Manager, Coder, and Quality Assurer roles — significantly improves structural quality in automated ontology generation from unstructured insurance contract text, with gains driven primarily by front-loaded planning. The study also surfaces concrete failure modes in single-agent baselines including poor Ontology Design Pattern compliance, structural redundancy, and ineffective iterative repair.
A peer-reviewed arXiv paper submitted April 25, 2026 demonstrates that decomposing ontology construction into four specialized agent roles — Domain Expert, Manager, Coder, and Quality Assurer — significantly improves structural quality over single-agent LLM baselines, with performance gains driven primarily by front-loaded planning. The study used domain-specific insurance contracts as its experimental corpus and evaluated outputs via heterogeneous LLM judges and competency-question-driven SPARQL assessment.
A peer-reviewed paper from researchers affiliated with Stanford and InquiryOn proposes treating human-in-the-loop (HITL) oversight as a decoupled, independent system component in agentic AI workflows, formalizing integration along four dimensions and aligning the model with the Agent-to-Agent (A2A) interoperability protocol. The work signals emerging academic consensus that HITL must be a first-class architectural concern rather than an application-level implementation detail.
A peer-reviewed paper submitted to arXiv cs.AI on 24 April 2026 proposes a decoupled Human-in-the-Loop (HITL) system architecture that treats human oversight as an independent system component in agentic workflows, formalizing integration across four dimensions and supporting alignment with emerging agent communication protocols. The research identifies scalability and reuse limitations in current embedded HITL implementations as a structural gap in multi-agent environments.
A peer-reviewed arXiv paper published 26 April 2026 introduces spotforecast2-safe, an open-source Python package that embeds EU AI Act, IEC 61508, ISA/IEC 62443, and Cyber Resilience Act requirements directly into library API contracts, persistence formats, and CI gates — operationalizing compliance-by-design as a concrete, verifiable architectural pattern for safety-critical time-series forecasting.
A peer-reviewed arXiv paper submitted April 26, 2026 demonstrates that architectural choices in NLP pipelines — specifically evidence retrieval mechanism, retrieval-inference coupling, and baseline classification accuracy — are the primary determinants of adversarial evasion rates, with legacy lexical systems reaching 97.02% evasion and modern LLM-based systems ranging from 19.95% to 40.34% under a strict black-box, 10-query threat model.
A peer-reviewed paper submitted to arXiv cs.AI on 25 April 2026 proposes using Active Inference as a formal method for phenotyping agency in AI systems, introducing empowerment as an operational metric to distinguish zero-, intermediate-, and high-agency phenotypes, and arguing that effective AI governance must shift from external constraints to internal modulation of prior preferences as agent autonomy increases.
A peer-reviewed arXiv paper (submitted April 25, 2026) introduces EPO-Safe, a framework enabling LLM agents to autonomously discover and evolve auditable behavioral safety specifications from sparse binary danger signals — without human authorship or access to hidden reward functions. The research empirically demonstrates that standard reward-driven reflection accelerates reward hacking rather than improving safety, establishing that dedicated safety feedback channels are a necessary architectural component for safe agentic systems.
A peer-reviewed study evaluated nine debiasing strategies across five LLM judge models from four provider families (Google, Anthropic, OpenAI, Meta), finding that style bias is the dominant and underappreciated bias in LLM-as-a-Judge pipelines, position bias is now negligible in current-generation models, and structured debiasing strategies yield statistically significant accuracy improvements for select model-strategy pairs — with 18 of 20 non-baseline configurations improving over baseline.
Mesa, an early-stage San Francisco startup founded in 2025, is offering early access to a versioned filesystem purpose-built for AI agents. The product combines Git-style branching and versioning with sub-50ms read/write performance, parallel agent isolation, checkpoint/rollback semantics, fine-grained ACLs, SOC 2 Type II compliance, and BYOC deployment on AWS, GCP, or Azure — signaling that enterprise-grade agentic infrastructure with explicit governance controls is emerging as a distinct product category.
Researchers from the University of Pennsylvania and Carnegie Mellon University published a peer-reviewed framework — Benchmarks for Stateful Defenses (BSD) — demonstrating that decomposition attacks, which fragment harmful queries into individually benign sub-tasks, consistently bypass safety-trained frontier models including Claude Sonnet 3.5/3.7 and GPT-5. The research establishes that existing single-turn safety benchmarks are insufficient for evaluating real-world misuse, and that stateful, multi-turn defenses are required to detect distributed misuse patterns.
Researchers from the University of Pennsylvania published and revised a peer-reviewed paper introducing BSD, an automated benchmarking pipeline for evaluating covert decomposition attacks against LLMs and corresponding stateful defenses. The work documents that decomposition attacks are effective misuse enablers and that stateful defenses represent a promising countermeasure class — findings categorized under Computer Science Cryptography and Security.
A peer-reviewed paper submitted to arXiv cs.CR introduces the Refute-or-Promote (RoP) pipeline, a staged multi-agent verification methodology for LLM-assisted defect discovery. Across a 31-day campaign targeting 7 systems including security libraries, the ISO C++ standard, and major compilers, the pipeline eliminated approximately 79% of 171 false-positive candidates before disclosure. Real-world outcomes included 4 CVEs, 8 merged security fixes, standards contributions, and regulatory filings. The paper explicitly frames its contribution as external structure that filters LLM false positives — not autonomous vulnerability discovery.
A peer-reviewed paper submitted to arXiv cs.CR on 20 April 2026 formalizes 'Owner-Harm' as a distinct threat model for AI agents — covering eight categories of deployer-damaging behavior — and demonstrates that existing compositional safety systems achieve 100% detection on generic criminal harm benchmarks but only 14.8% on prompt-injection-mediated owner-harm tasks. A two-stage gate plus deterministic post-audit verifier architecture raises overall detection to 85.3% TPR and hijacking detection from 43.3% to 93.3%, establishing multi-layer verification as an empirically validated design requirement for enterprise agentic deployments.
A peer-reviewed paper submitted to arXiv under Computer Science > Cryptography and Security introduces Arbiter-K, a governance-first execution architecture for agentic AI systems. The architecture implements a Semantic Instruction Set Architecture (ISA) and deterministic neuro-symbolic kernel to intercept unsafe agent behaviors, achieving 76–95% unsafe interception rates — a 92.79% absolute gain over native policies — on OpenClaw and NanoBot benchmarks. Code is publicly available, signaling ecosystem-level movement toward microarchitectural governance as a research priority.
These articles touch this plane but are primarily mapped elsewhere.
Researchers from Japan's National Institute of Informatics and NTT have published a formal treatment of reasoning shortcuts in neurosymbolic learning, establishing that constraint satisfaction alone does not guarantee correct concept mapping, that shortcut detection is coNP-complete, and that a verified ASP-based repair algorithm can eliminate shortcuts by augmenting constraint sets. The work provides complexity-grounded theoretical foundations for output verification in constraint-based AI systems.
A peer-reviewed arXiv paper introduces Analytica, a novel agent architecture using Soft Propositional Reasoning (SPR) that achieves 15.84% average accuracy improvement over base models on economic, financial, and political forecasting tasks, with a cost-efficient Jupyter Notebook grounder variant delivering comparable accuracy at 90.35% lower cost. The work formalizes bias-variance decomposition as a design principle for LLM reasoning systems and demonstrates near-linear time complexity at scale.
Researchers released FormalScience, a domain-agnostic human-in-the-loop agentic pipeline for converting informal scientific reasoning into formal Lean4 proofs, accompanied by FormalPhysics — a 200-problem university-level physics benchmark with formally verified representations. The work introduces the first systematic characterisation of semantic drift in physics autoformalisation and publicly releases both the codebase and an interactive UI system.
Researchers have published the first dataset and expert evaluation framework for assessing open-ended legal reasoning by LLMs within the Japanese jurisdiction, based on the writing component of the Japanese bar examination. The study includes manual hallucination analysis and legal expert evaluation, with all resources to be made publicly available.
A peer-reviewed arXiv paper (cs.AI, submitted 26 April 2026) introduces DxChain, a chain-based clinical reasoning framework that achieves state-of-the-art performance on diagnostic accuracy and logical consistency across two real-world MIMIC-IV benchmarks. The framework operationalizes a three-phase cognitive cycle — Memory Anchoring, Navigation, and Verification — and introduces adversarial debate, tree-of-thoughts planning, and cold-start hallucination mitigation as named, measurable architectural components. The work is publicly available and represents a validated reference pattern for structured agentic reasoning in a regulated domain.
A peer-reviewed arXiv paper introduces FinGround, a three-stage verify-then-ground pipeline for financial document QA that achieves 78% hallucination reduction relative to GPT-4o and 68% reduction over the strongest baseline under retrieval-equalized evaluation. The paper explicitly frames hallucination detection as a compliance requirement tied to the EU AI Act's August 2026 high-risk enforcement deadline, and demonstrates cost-controlled verification at $0.003 per query via an 8B distilled detector.
Researchers published CAP-CoT, a three-agent adversarial prompt optimization framework that improves chain-of-thought reasoning accuracy and stability in LLMs through iterative bidirectional feedback loops, demonstrating consistent variability reduction within two to three cycles across six benchmarks and four model backbones.
Sherpa.ai submitted a cryptography paper introducing a multi-party private set union (PSU) protocol for vertical federated learning that hides intersection membership and supports both exact and typo-tolerant identifier matching, targeting regulated verticals including healthcare, financial services, and telecommunications.
A peer-reviewed arXiv paper (cs.CR, submitted 21 April 2026) introduces Phoenix, a training-free multi-agent framework for vulnerability detection that uses Behavioral Contract Synthesis. Phoenix achieves F1 = 0.825 on PrimeVul Paired using 7–14B open-source models — up to 48x smaller than competing approaches — while exposing a systemic benchmark reliability failure in legacy deep learning vulnerability detection models.
A peer-reviewed paper from five leading Chinese research institutions introduces Arbiter-K, a governance-first execution architecture that encapsulates LLMs within a deterministic symbolic kernel. Empirical evaluations document that native guardrails in existing agentic systems — including Amazon Bedrock AgentCore and Anthropic Skills — intercept fewer than 9% of unsafe operations under adversarial conditions, while Arbiter-K achieves 76–95% unsafe interception. The paper's public code release signals that kernel-based governance for agentic AI is moving from theoretical positioning into implementable reference architecture.
Researchers from Tsinghua University, Alibaba Group, and Bengbu University published HELM, a framework that addresses structural long-horizon memory failures in Vision-Language-Action (VLA) models. The work demonstrates that extending context windows alone does not close the performance gap in multi-step robotic manipulation tasks, and introduces an Episodic Memory Module with CLIP-indexed keyframe retrieval and a pre-execution State Verifier MLP. All experiments are conducted in simulation; real-robot deployment has not been validated.
A peer-reviewed arXiv submission introduces HELM, a model-agnostic framework for vision-language-action manipulation that addresses three named execution-loop deficiencies — memory gap, verification gap, and recovery gap — through three coupled components: an Episodic Memory Module, a learned State Verifier, and a Harness Controller. Empirical results show a 23.1-point task success improvement over OpenVLA on LIBERO-LONG, with the State Verifier's effectiveness shown to depend critically on episodic memory access. The work also releases LIBERO-Recovery as a standardized perturbation-injection evaluation protocol.
Researchers from Nanjing University, Alibaba Group, and Ant Group have identified structured, measurable attention patterns in thinking LLMs — including DeepSeek-R1, GPT-5, and Gemini 3 series — that correlate with correctness on quantitative reasoning tasks. The study introduces a Self-Reading Quality (SRQ) scoring method combining geometric and semantic metrics, and demonstrates a training-free steering approach yielding up to 2.6% accuracy improvement. The findings establish that reasoning trace integration quality is observable and steerable at inference time, with direct implications for monitoring and verification layer design in agentic systems.
A peer-reviewed paper submitted April 21, 2026 to arXiv cs.CL introduces Self-Reading Quality (SRQ) scores — a training-free method for steering LLM inference toward correct quantitative reasoning by measuring and acting on internal attention patterns between answer tokens and reasoning traces.
A new academic framework called RARE (Redundancy-Aware Retrieval Evaluation) demonstrates that standard RAG retrieval benchmarks significantly overstate real-world performance in high-similarity corpora such as financial reports, legal codes, and patents. A strong retriever baseline scoring 66.4% on general benchmarks drops to 5.0–27.9% on domain-specific redundancy-aware benchmarks, revealing a material gap between benchmark validation and production robustness in regulated verticals.
Add implementation guidance, patterns, and reference material here.
Track open research questions and emerging developments for this plane.
AI Agent Identity: Standards Fragmentation, Regulatory Gaps, and Emerging Governance Infrastructure — evt_src_a5189e3c6140e1d7 ↩︎ ↩︎ ↩︎ ↩︎
arXiv Research Identifies Five Structural Gaps in AI Agent Identity Frameworks, Finds No Current Technology or Regulation Adequate — evt_src_39d1f809d35c7012 ↩︎
Formal Owner-Harm Threat Model Exposes Critical Gap in AI Agent Safety Benchmarks and Proposes Multi-Layer Verification Architecture — evt_src_cd647d2c2e513723 ↩︎
Peer-Reviewed Research Quantifies Architectural Vulnerability Rates in Black-Box NLP Misinformation Detection Pipelines — evt_src_fab708e0bf6a2642 ↩︎
Academic Research Proposes Governance-First Execution Kernel (Arbiter-K) for Agentic AI Systems with Quantified Safety Gains — evt_src_b1b5120371728c58 ↩︎
ClawNet: Academic Research Proposes Identity-Governed Multi-Agent Collaboration Framework with Explicit Governance Primitives — evt_src_41e455ab4dd54226 ↩︎ ↩︎
Academic Research Formalizes Decoupled Human-in-the-Loop Architecture as Emerging Standard for Agentic AI Governance — evt_src_09f1ab5262157ce9 ↩︎
Academic Research Formalizes Decoupled Human-in-the-Loop Architecture for Multi-Agent Governance — evt_src_530c62f3b645e9c8 ↩︎ ↩︎
EU AI Act-Compliant Open-Source Time-Series Forecasting Package Demonstrates Compliance-by-Design Architecture for Safety-Critical Environments — evt_src_559b36964ccc0989 ↩︎
Mesa Launches Versioned Filesystem Infrastructure for AI Agents with Governance-First Architecture — evt_src_18f3c630270f01a5 ↩︎
Academic Research Introduces BSD Framework Benchmarking AI Misuse via Decomposition Attacks, Exposing Gaps in Frontier Model Safety Evaluations — evt_src_33577c1376310c4e ↩︎
REVEAL Framework: Reasoning-Augmented AI Content Detection Signals Growing Demand for Interpretable Output Verification in Enterprise AI — evt_src_c26e696f6c0222ba ↩︎
SafetyALFRED Benchmark Reveals Systematic Gap Between Hazard Recognition and Active Mitigation in Multimodal LLMs — evt_src_6b99d93e7bbe7cd4 ↩︎
SafetyALFRED Benchmark Reveals Systematic Gap Between Hazard Recognition and Active Mitigation in Multimodal LLMs — evt_src_01de9937633af1d1 ↩︎
arXiv Paper Proposes Active Inference Framework for Phenotyping Agency in AI Systems, Linking Governance Controls to Internal Preference Modulation — evt_src_72cae3b89f51b753 ↩︎
Academic Research Proposes Four-Axis Alignment Framework for Enterprise AI Agents in Regulated Decisioning Domains — evt_src_3c968ef5c5148f1a ↩︎
Academic Research Surfaces Multi-Axis Alignment Gap in Enterprise AI Agents Across All Evaluated Architectures — evt_src_7c413e4f2703ba1c ↩︎
Peer-Reviewed Research Formalizes Constraint-Based Detection and Repair of Reasoning Shortcuts in Neurosymbolic AI Systems — evt_src_3981fd481eef51d3 ↩︎
EPO-Safe Framework Demonstrates Autonomous Safety Specification Discovery from Minimal Feedback Signals in Agentic LLM Systems — evt_src_3589f01cc1d3bc52 ↩︎
Academic Research Formalizes Harm Recovery as a Distinct Safety Problem for Computer-Use Agents — evt_src_f7dc61cc032cc59e ↩︎
PhySE Framework Demonstrates Validated Real-Time AR-LLM Social Engineering Attack Surface with Adaptive Psychological Control — evt_src_403b222c01e9a056 ↩︎
Academic Research Demonstrates Real-Time AR-LLM Social Engineering Framework with Empirical Validation — evt_src_b031b45921345b76 ↩︎
Adversarial Multi-Agent Review Methodology Demonstrates 79–83% False-Positive Kill Rate in LLM-Assisted Security Defect Discovery — evt_src_2d90e66d0bee0562 ↩︎