I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems
Evaluates whether multi-agent LLM governance systems follow institutional rules when granted authority, finding integrity requires pre-deployment safeguards.
Evaluates whether multi-agent LLM governance systems follow institutional rules when granted authority, finding integrity requires pre-deployment safeguards.
Studies cross-model alignment of LLM representations for downstream objectives with applications in privacy-preserving and security-constrained settings.
Research manifesto proposing Agentic Business Process Management paradigm extending BPM for governing autonomous agents executing organizational processes.
Extends structural causal models with intentional interventions operator for teleological inference about goal-directed agent behavior in causal systems.
Evaluates PPS (5W3H-based structured prompting framework) for reducing intent transmission loss between users and LLMs across business, technical, and travel domains.
GAN-based simulation framework measuring racial bias propagation in predictive policing systems across multiple US cities with temporal analysis.
Uses Stochastic Gumbel AlphaZero to evaluate difficulty in Tetris Block Puzzle variants, applying game-playing AI as evaluator for puzzle design.
Analyzes online resource allocation among interacting modules with endogenous costs under uniform, gated, and competitive allocation paradigms with regret bounds.
Stability Monitor: behavioral fingerprinting system tracking LLM endpoint identity changes from model updates, quantization, inference engines beyond traditional uptime metrics.
Evaluates whether cross-domain mapping interventions increase creativity equally in humans and LLMs through product feature generation experiments.
LuMamba: self-supervised Mamba architecture for EEG modeling with topology-invariant electrode handling and improved computational efficiency over Transformers.
Studies how uncertainty estimation scales with parallel sampling in reasoning models using self-consistency and verbalized confidence across mathematics and STEM tasks.
Large-scale trace-level study showing multi-pass LLM reasoning in binary vulnerability analysis exhibits structured, token-level exploration patterns across hundreds of steps.
D5P4: generalized beam-search framework using determinantal point processes for diverse parallel decoding in discrete diffusion text generation models.
cuGenOpt: GPU-accelerated metaheuristic framework for combinatorial optimization balancing generality, performance, and usability across logistics and scheduling problems.
Box Maze framework enforces LLM reasoning integrity through process-control architecture to mitigate hallucination and unreliable reasoning under adversarial prompting.
OS-Themis: scalable multi-agent critic framework using decomposed trajectory milestones for training robust GUI agents with reinforcement learning.
Uses optimal transport as alignment objective for fine-tuning multilingual contextualized embeddings to improve cross-lingual word representations.
Comparative study evaluating whether LLMs demonstrate Theory of Mind capabilities using psychological paradigms.
TherapyGym evaluation framework for therapy chatbots measuring clinical fidelity and safety using psychotherapy rating scales.
Uncertainty-calibrated prompt optimization framework for LLM classification that measures model confidence to improve reliability.
LLM-based agent framework for automated extraction of structured political biography data from unstructured sources at scale.
DynaRAG framework extending RAG with dynamic API calls for time-sensitive queries; includes sufficiency classification and reranking.
Analysis of explainability in harmful content detection models, examining predictions on borderline and contextual cases.
MineDraft framework for batch parallel speculative decoding to accelerate LLM inference by parallelizing draft and verification stages.
Tool for collecting granular metadata about language model benchmarks to verify alignment with practitioner goals and test coverage.
Multi-task learning framework for personalized open-vocabulary keyword spotting with privacy and customization for voice assistants.
Keyword spotting framework integrating phoneme learning with personalized prosody modeling for speaker-specific voice recognition.
Study examining relationship between firms' AI technology innovation investments and consumer complaint patterns.
Adaptive Extended Kalman Filter using knowledge distillation for improved UWB/PDR indoor localization under NLOS conditions.
Method for increasing transformer modularity and interpretability through per-layer supervision to overcome distributed redundancy.
Quine runtime that implements LLM agents as native POSIX processes using OS-level isolation and scheduling instead of application-layer frameworks.
Method for distinguishing between system failures and domain shifts in industrial data streams using anomaly detection.
Study of poisoning attacks against RAG systems where adversaries corrupt retrieval corpora to manipulate LLM outputs; includes defenses.
Research on multi-agent LLM routing systems showing that quality-based delegation can fail when agents misreport performance; proposes delegation contracts to address this.
NANOZK: Zero-knowledge proof system enabling cryptographic verification that proprietary LLM API outputs actually used claimed models.
S3T-Former: Energy-efficient spike-driven state-space transformer for skeleton-based action recognition on resource-constrained edge devices.
MCP-38: Protocol-specific threat taxonomy with 38 threat categories for Model Context Protocol systems derived through systematic methodology.
Synthesizable RTL implementation of predictive coding networks enabling online, distributed hardware learning as alternative to backpropagation.
Lightweight LLM adaptation framework for technical service agents using latent logic augmentation and noise reduction techniques.
SLEA-RL: Step-level experience augmentation for multi-turn LLM agent training enabling dynamic retrieval and leveraging accumulated episode experiences.
Meta-BayFL: Probabilistic federated learning framework with Bayesian neural networks for heterogeneous data and model personalization.
Study uncovering latent phase structures and branching logic in deep RL locomotion policies for HalfCheetah control task interpretability.
Dynamic constraints framework for reinforcement learning fine-tuning that adapts constraints based on model capabilities to balance stability and optimization.
CytoSyn: Foundation diffusion model for computational histopathology enabling cell segmentation and tumor analysis from digitized slides.
Trace-based assurance framework for agentic AI orchestration with contracts, testing, and governance for LLM-coordinated multi-agent systems.
Training-only framework for few-shot CLIP adapters using heterogeneous image-patch-text graph supervision without inference cost overhead.
ARTEMIS: Neuro-symbolic framework combining neural operators and SDEs for interpretable, arbitrage-free quantitative finance models.
Discovery of bimodal drift rate structure in fast radio burst FRB 20240114A using unsupervised machine learning for astrophysics analysis.
Tula: Optimization framework for distributed large-batch training balancing communication overhead, computation cost, and generalization performance.