Isolater - Feed

Ax Gil Harari, Yoel Zimmermann, Ola Tangen Kulseng, Laura Zichi, Chuin Wei Tan, Marc L. Descoteaux, Boris Kozinsky 9d ago

Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Systematic comparison of matrix-structured optimizers (Muon, SOAP) versus Adam for training machine learning interatomic potentials with improved efficiency.

Ax Yunhe Li, Hao Shi, Wenhao Liu, Mengzhe Ruan, Hanxu Hou, Zhongxiang Dai, Shuang Qiu, Linqi Song 9d ago

DemoPSD: Disagreement-Modulated Policy Self-Distillation

DemoPSD method improves LLM reasoning via self-distillation with disagreement modulation to reduce overfitting and improve cross-domain generalization.

Ax Yuxuan Li, Lingxi Xie, Xinyue Huo, Jihao Qiu, Jiacheng Shao, Pengfei Chen, Jiannan Ge, Kaiwen Duan, Qi Tian 9d ago

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

Benchmark dataset (DramaSR-532K) and reasoning LLM approach for speaker recognition in TV dramas using dialogue attribution tasks.

Ax Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng 9d ago

Program-as-Weights: A Programming Paradigm for Fuzzy Functions

Program-as-Weights paradigm compiles natural-language specifications into compact, locally-executable neural artifacts for fuzzy functions.

Ax Matteo Boglioni, Thibault Rousset, Siva Reddy, Marius Mosbach, Verna Dankers 9d ago

LACUNA: A Testbed for Evaluating Localization Precision for LLM Unlearning

LACUNA testbed evaluates localization precision in LLM unlearning methods that remove sensitive training data and PII.

Ax R\'ois\'in Luo, James McDermott, Colm O'Riordan 9d ago

Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition

Model-agnostic global interpretability method for understanding perturbation robustness in image models via spectral analysis.

Ax Hana Chockler, David A. Kelly, Daniel Kroening, Youcheng Sun 9d ago

Causal Explanations for Image Classifiers

Black-box method for computing image classifier explanations using formal causal theory and actual causality definitions.

Ax Yuhan Li, Wei Zhang, Juan Chen, Jiangjia Yan, Peng Xiangli, Liangze Yin 9d ago

ADMC: Attention-based Diffusion Model for Missing Modalities Feature Completion

Attention-based diffusion model for completing missing modalities in multimodal emotion and intent recognition tasks.

Ax Saurabh Ranjan, Brian Odegaard 9d ago

Psychological Imagination Networks Show Cross-Population Centrality and Clustering Alignment in Humans That Large Language Models Fail to Replicate

Psychological network analysis comparing mental imagery structure between humans and LLMs across populations and languages.

Ax Hanyu Wang, Ruohan Xie, Yutong Wang, Guoxiong Gao, Xintao Yu, Bin Dong 9d ago

Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph

Aria agent system for theorem formalization in Lean using retrieval and iterative auto-formalization to improve LLM accuracy in mathematics.

Ax Raj Ghugare, Roger Creus Castanyer, Catherine Ji, Kathryn Wantlin, Jin Schofield, Karthik Narasimhan, Benjamin Eysenbach 9d ago

BuilderBench: The Building Blocks of Intelligent Agents

BuilderBench benchmark for developing AI agents that learn through interaction and exploration rather than mimicry alone.

Ax Yankai Jiang, Yujie Zhang, Peng Zhang, Wenjie Li, Yichen Li, Jintai Chen, Xiaoming Shi, Shihui Zhen 9d ago

Ophiuchus: Incentivizing Tool-augmented "Think with Images" for Joint Medical Segmentation, Understanding and Reasoning

Ophiuchus tool-augmented framework enabling medical MLLMs to dynamically focus on fine-grained visual regions for clinical reasoning tasks.

Ax Masum Hasan, Junjie Zhao, Ehsan Hoque 9d ago

HAL: Inducing Human-likeness in LLMs with Alignment

HAL framework for aligning language models to human-likeness through interpretable, data-driven alignment methods.

Ax Peixin Huang, Yaoxin Wu, Yining Ma, Cathy Wu, Wei Zhang, Wen Song 9d ago

A General Neural Backbone for Mixed-Integer Linear Optimization via Dual Attention

Attention-driven neural backbone for solving mixed-integer linear programming using graph neural networks with improved representation power.

Ax Menglin Xia, Xuchao Zhang, Shantanu Dixit, Paramaguru Harimurugan, Rujia Wang, Victor Ruhle, Robert Sim, Chetan Bansal, Saravan Rajmohan 9d ago

Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity

Memora harmonic memory representation system for agent memory balancing abstraction and specificity for efficient context-aware retrieval.

Ax Fengyuan Liu, Jay Gala, Nilaksh, Dzmitry Bahdanau, Siva Reddy, Hugo Larochelle 9d ago

BRIDGE: Predicting Human Task Completion Time From Model Performance

BRIDGE psychometric framework predicting human task completion time from model performance without direct human annotations.

Ax Anirudh Ajith, Amanpreet Singh, Jay DeYoung, Nadav Kunievsky, Austin C. Kozlowski, Oyvind Tafjord, James Evans, Daniel S. Weld, Tom Hope, Doug Downey 9d ago

PreScience: A Dataset and Benchmark for Scientific Forecasting

PreScience dataset and benchmark for forecasting scientific advances using 98K AI papers with citations and author histories.

Ax Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Haoxuan Li, Hao Wang, Shijian Wang, Guanting Dong, Jiajie Jin, Yinuo Wang, Yuan Lu, Ji-Rong Wen, Zhicheng Dou, Zhouchen Lin 9d ago

OmniGAIA: Towards Native Omni-Modal AI Agents

OmniGAIA benchmark evaluating omni-modal AI agents with vision, audio, and language integration for complex reasoning and tool usage.

Ax Giona Fieni, Joschua W\"uthrich, Marc-Philippe Neumann, Christopher H. Onder 9d ago

Learning-based Multi-agent Race Strategies in Formula 1

Reinforcement learning approach for multi-agent Formula 1 race strategy optimization, modeling energy, tire degradation, and competitor behavior.

Ax Drew Prinster, Clara Fannjiang, Ji Won Park, Kyunghyun Cho, Anqi Liu, Suchi Saria, Samuel Stanton 9d ago

Conformal Policy Control

Conformal policy control method using safe reference policies to regulate untested agent policies, balancing exploration and safety constraints.

Ax Boyuan Guan, Wencong Cui, Levente Juhasz 9d ago

A Dual-Helix Governance Approach Towards Reliable Agentic Artificial Intelligence for WebGIS Development

Dual-helix governance framework stabilizing agentic AI for WebGIS by using knowledge graphs and protocol enforcement to address context and instruction failures.

Ax Andreas Schlapbach 9d ago

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

Formal verification framework for LLM agent protocols, comparing Schema-Guided Dialogue and Model Context Protocol for agent-tool integration.

Ax Pablo de los Riscos, Fernando J. Corbacho, Michael A. Arbib 9d ago

Working Paper: Towards a Category-theoretic Comparative Framework for Artificial General Intelligence

Category-theoretic framework for defining and comparing AGI systems, addressing lack of formal AGI definitions and benchmarking approaches.

Ax Niklas Herbster, Martin Zborowski, Alberto Tosato, Gauthier Gidel, Tommaso Tosato 9d ago

Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence

Activation steering methods to prevent LLM misalignment at runtime by manipulating linear structures in activation space.

Ax Trilok Padhi, Ramneet Kaur, Krishiv Agarwal, Adam D. Cobb, Daniel Elenius, Manoj Acharya, Colin Samplawski, Alexander M. Berenbeim, Nathaniel D. Bastian, Susmit Jha, Ugur Kursuncu, Anirban Roy 9d ago

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

Framework for interpreting temporal evolution of concepts in LLM agents using conformal inference, improving transparency of sequential behavior.

Ax Sen Cui, Jingheng Ma 9d ago

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

Hamiltonian-based approach to generative world modeling combining video synthesis, 3D scene reconstruction, and latent predictive models.

Ax Zenghui Zhou, Man Li, Xiaoke Fang, Xinyi Zhou, Weibin Lin, Zheng Zheng 9d ago

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs

LGMT framework uses first-order logic for oracle-free evaluation of LLM reasoning robustness under logically equivalent transformations.

Ax Pengyu Zhu, Lijun Li, Yaxing Lyu, Qianxin Luo, Jingyi Yang, Yi Liu, Tingfeng Hui, Xinyu Yuan, Li Sun, Sen Su, Jing Shao 9d ago

A Unified Framework for the Evaluation of LLM Agentic Capabilities

Unified evaluation framework for LLM agentic capabilities that separates model capability from benchmark implementation choices for fair cross-benchmark comparison.

Ax Tong Bai, Zhenglin Wan, Pengfei Zhou, Xingrui Yu, Yang You, Ivor W. Tsang 9d ago

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale

SkillDAG framework models inter-skill relationships as typed directed graphs for LLM agent skill selection at scale, improving over similarity-matching approaches.

Ax Wojciech Zarzecki, Jan Dubi\'nski, Sebastian Cygert 9d ago

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

Analysis of benchmark contamination detection methods for LLMs, showing limitations of statistical tools in realistic auditing scenarios with distribution shift.

Ax Muhammad Zia Hydari, Raja Iqbal 9d ago

The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents

Study of stochasticity sources in AI agents, examining how foundation models and orchestration loops produce variability in planning, tool calls, and outputs.

Ax Xinbao Qiao, Xianglong Du, Wei Liu, Jingqi Zhang, Peihua Mai, Meng Zhang, Yan Pang 9d ago

When Sample Selection Bias Precipitates Model Collapse

Research on model collapse from recursive training on synthetic data and how sample selection bias affects model verification in low-resource regimes.

Ax Tingyang Chen, Shuo Lu, Kang Zhao, Weicheng Meng, Hanlin Teng, Tianhao Li, Chao Li, Xule Liu, Jian Liang, Zhizhong Zhang, Yuan Xie, Heng Qu, Kun Shao, Jian Luan 9d ago

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

HarnessX: foundry for composable, adaptive agent harnesses combining prompts, tools, memory, and control flow with systematic evolution from execution traces.

Ax Sergei Trashchenkov 9d ago

Power Systems Agent Benchmark: Executable Evaluation of AI Agents in Electric Power Engineering

Power Systems Agent Benchmark: executable evaluation framework for tool-using AI agents applied to power engineering tasks with concrete outcome verification.

Ax Jeffrey Flynt 9d ago

GroundEval: A Deterministic Replacement for LLM-as-Judge in Stateful Agent Evaluation

GroundEval: deterministic alternative to LLM judges for agent evaluation, verifying agent search, retrieval, and citation behavior through execution traces.

Ax Xinyuan Song, Zekun Cai 9d ago

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

Grounded Iterative Language Planning: parameterized world models for LLM agents reducing hallucination propagation through measurable transition prediction.

Ax Tianlong Wang, Yuhang Wang, Weibin Liao, Xin Gao, Xinyu Ma, Yang Lin, Yasha Wang, Liantao Ma 9d ago

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

Dynamic representation editing framework steering LLM reasoning trajectories toward truth by analyzing geometry of correctness in reasoning chains.

Ax Tianyu Jin, Shuo Chen, Yida Wang, Liuyu Xiang, Yingzhuo Liu, Zhiyao Jiang, Yexin Li, Zhaofeng He 9d ago

SAGA: Scene-Aware, Goal-Evolving Agents for Long-Horizon CivRealm Strategy Planning

SAGA: scene-aware multi-agent system for long-horizon strategy planning in CivRealm addressing scene blindness, context overflow, and cross-game learning.

Ax Anuj Kaul, Qianlong Lan, Pranay Gupta 9d ago

Behavioral Governance for Autonomous AI Agents: The AgentBound Framework

AgentBound: behavioral governance framework for autonomous AI agents controlling consequential actions (transactions, communications) based on operational context.

Ax Arshia Soltani Moakhar, Iman Gholami, Max Springer, Mahdi JafariRaviz, MohammadTaghi Hajiaghayi 9d ago

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

Framework for autoformalization: automatic translation of natural language mathematics to Lean 4 verifiable code using LLM agents beyond standard libraries.

Ax Kaiwen Xiong, Haonian Ji, Shi Qiu, Zeyu Zheng, Cihang Xie, Xinyu Ye, Huaxiu Yao 9d ago

ClawArena-Team: Benchmarking Subagent Orchestration and Dynamic Workflows in Language-Model Agents

ClawArena-Team: benchmark for evaluating LLM agents managing subagents through dynamic workflows with parallel asynchronous orchestration.

Ax Shreya Rajpal, Tanawan Premsri, Parisa Kordjamshidi 9d ago

Spatial Reasoning via Modality Switching Between Language and Symbolic Representation

Framework for spatial reasoning via switching between language and symbolic representations (layouts, grids) to improve multi-hop reasoning in LLMs.

Ax Yankai Jiang, Weiting Tang, Haoran Sun, Zhenyu Tang, Yuejie Hou, Yingnan Han, Rubo Wang, Yueyuxiao Yang, Cheng Liang, Lilong Wang, Wenjie Lou, Xiaosong Wang, Lei Bai, Meng Yang 9d ago

A Self-Evolving Agentic System for Automated Generation and Execution of Biological Protocols

ProtoPilot: self-evolving multi-agent system for automated generation and execution of biological lab protocols with alignment between design and physical execution.

Ax Mathilde Noual 9d ago

The MMM Data Model -- A Normative Specification for Knowledge Interoperability in a Decentralisable Knowledge Commons

MMM Data Model: normative specification for knowledge interoperability in decentralized systems, addressing limitations of document-centric design.

Ax Huang Hu, Xianchao Wu, Bingfeng Luo, Chongyang Tao, Can Xu, Wei Wu, Zhan Chen 9d ago

Playing 20 Question Game with Policy-Based Reinforcement Learning

Policy-based reinforcement learning approach for the 20 Questions game where agent acts as questioner using strategic question selection.

Ax Tong Xiao, Jingbo Zhu 9d ago

Introduction to Transformers: an NLP Perspective

Introduction to Transformer architecture covering basic concepts, model refinements, and NLP applications.

Ax Kaustubh Chakradeo (University of Copenhagen, Section of Epidemiology, Department of Public Health, Copenhagen, Denmark), Pernille Nielsen (Technical University of Denmark, Department of Applied Mathematics and Computer Science, Denmark), Lise Mette Rahbek Gjerdrum (Department of Pathology, Copenhagen University Hospital- Zealand University Hospital, Roskilde, Denmark, Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark), Gry Sahl Hansen (Department of Pathology, Copenhagen University Hospital- Zealand University Hospital, Roskilde, Denmark), David A Duch\^ene (University of Copenhagen, Section of Epidemiology, Department of Public Health, Copenhagen, Denmark), Laust H Mortensen (University of Copenhagen, Section of Epidemiology, Department of Public Health, Copenhagen, Denmark, Danmarks Statistik, Denmark), Majken K Jensen (University of Copenhagen, Section of Epidemiology, Department of Public Health, Copenhagen, Denmark), Samir Bhatt (University of Copenhagen, Section of Epidemiology, Department of Public Health, Copenhagen, Denmark, Imperial College London, United Kingdom) 9d ago

Comparative Analysis of Lightweight CNNs for Resource-Constrained Devices: Predictive Performance, Efficiency Trade-offs, and Initialization Effects

Controlled benchmark comparing seven lightweight CNNs on image classification tasks under unified training protocol, measuring accuracy and efficiency trade-offs.