Isolater - Feed

Ax Cedric Haufe, Frieder Stolzenburg 26d ago

From Large Language Model Predicates to Logic Tensor Networks: Neurosymbolic Offer Validation in Regulated Procurement

Neurosymbolic approach combining LLMs with Logic Tensor Networks for auditable offer validation in regulated procurement, ensuring factually correct and legally verifiable decisions.

Ax Liyuan Deng, Shujian Deng, Yongkang Chen, Yongkang Dai, Zhihang Zhong, Linyang Li, Xiao Sun, Yilei Shi, Huaxi Huang 26d ago

COSMO-Agent: Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

COSMO-Agent tool-augmented RL framework teaching LLMs to bridge CAD-CAE gap by translating simulation feedback into valid geometric edits for iterative industrial design optimization.

Ax Zhe Zhao, Haibin Wen, Jiaming Ma, Jiachang Zhan, Tianyi Xu, Ye Wei, Qingfu Zhang 26d ago

ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation

ResearchEVO framework for automated scientific discovery using LLMs to conduct undirected experimentation and generate explanations, instantiating discover-then-explain paradigm computationally.

Ax Xin Sun, Di Wu, Sijing Qin, Isao Echizen, Abdallah El Ali, Saku Sugawara 26d ago

Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge

Research on LLM-as-a-Judge showing both humans and LLMs exhibit bias toward human-authored content labels over identical AI-generated content via counterfactual design and eye-tracking.

Ax Amir Konigsberg 26d ago

Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution

Philosophical critique of behavioral evaluation paradigms for AI systems and proposal for cognitive assessment methods.

Ax Zhiyong Ma, Zhitao Deng, Huan Tang, Jialin Chen, Zhijun Zheng, Zhengping Li, Qingyuan Chuai 26d ago

PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models

PECKER algorithm for efficient machine unlearning in diffusion models with directed gradient updates.

Ax Qing Guo, Xinhang Li, Junyu Chen, Zheng Guo, Shengzhe Xu, Lin Zhang, Lei Li 26d ago

CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control

CuraLight framework combining RL and LLMs for traffic signal control with debate-guided data curation.

Ax Ojas Jain, Dhruv Kumar 26d ago

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo

LudoBench benchmark evaluating LLM strategic reasoning in Ludo board game with 480 handcrafted scenarios.

Ax Yitong Zhu, Yuxuan Jiang, Guanxuan Jiang, Bojing Hou, Peng Yuan Zhou, Ge Lin Kan, Yuyang Wang 26d ago

QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis

Quality-aware mixture of experts for multimodal sentiment analysis robust to noise and modality missingness.

Ax Jian Zhao, Haoren Luo, Yu Wang, Yuhan Cao, Pingyue Sheng, Tianxing He 26d ago

Can Large Language Models Reinvent Foundational Algorithms?

Unlearn-and-Reinvent pipeline testing whether LLMs can rediscover foundational algorithms after unlearning removal.

Ax Silja Ke{\ss}ler, Miriam Bautista-Salinero, Claudio Tennie, Charley M. Wu 26d ago

Emergent social transmission of model-based representations without inference

Study on cultural evolution showing minimal social learning can transmit higher-level representations without inference.

Ax Shuai Zhen, Yanhua Yu, Ruopei Guo, Nan Cheng, Yang Deng 26d ago

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

Hierarchical RL framework (STEP-HRL) for LLM agents using step-level transitions to reduce computational cost and history length.

Ax Hannah Sansford, Derek H. C. Law, Wei Liu, Abhishek Tripathi, Niresh Agarwal, Gerrit J. J. van den Burg 26d ago

Vision-Guided Iterative Refinement for Frontend Code Generation

Vision-language model critic for automated iterative refinement of frontend code generation with visual feedback loops.

Ax Xiangyue Zhang 26d ago

Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring

Open-source framework for autonomous LLM agents conducting deep learning experiments with hypothesis formation, training, and iterative refinement.

Ax Uljad Berdica, Fernando Acero, Anton Ipsen, Parisa Zehtabi, Michael Cashmore, Manuela Veloso 26d ago

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

Diagnostic framework determining when LLMs are necessary for contextual multi-armed bandits with text and numerical context.

Ax Gowthamkumar Nandakishore 26d ago

JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models

JTON format, JSON superset with Zen Grid encoding for token-efficient structured data processing in LLMs.

Ax Yinan Liu, Dongying Lin, Sigang Luo, Xiaochun Yang, Bin Wang 26d ago

Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models

Joint knowledge base completion and QA using combined large and small language models for KB-related tasks.

Ax Bowen Zeng, Feiyang Ren, Jun Zhang, Xiaoling Gu, Ke Chen, Lidan Shou, Huan Li 26d ago

HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference

KV cache compression technique for multimodal LLM inference, reducing memory overhead and latency with hybrid compression strategy.

Ax TianZe Zhang, Sirui Sun, Yuhang Xie, Xin Zhang, Zhiqiang Wu, Guojie Song 26d ago

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

Architecture for value-driven LLM agents addressing behavioral rigidity through context-value-action design.

Ax Maria Nesterova, Mikhail Kolosov, Anton Andreychuk, Egor Cherepanov, Oleg Bulichev, Alexey Kovalev, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik 26d ago

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

Foundation model enabling single GPT-based agent to perform across diverse multi-agent reinforcement learning tasks and environments.

Ax Yi Yuan, Xuhong Wang, Shanzhe Lei 26d ago

Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration

Research agent framework for generating trustworthy reports with confidence estimation and calibration mechanisms.

Ax Renxuan Tan, Rongpeng Li, Zhifeng Zhao, Honggang Zhang 26d ago

Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

Multi-objective preference alignment for LLMs using Pareto-lenient consensus to handle diverse human values in model training.

Ax Eranga Bandara, Ross Gore, Sachin Shetty, Piumi Siyambalapitiya, Sachini Rajapakse, Isurunima Kularathna, Pramoda Karunarathna, Ravi Mukkamala, Peter Foytik, Safdar H. Bouk, Abdul Rahman, Xueping Liang, Amin Hass, Tharaka Hewa, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, Nilaan Loganathan 26d ago

Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

AI agents for retail supply chain operations, automating demand forecasting, procurement, and inventory replenishment in supermarket chains.

Ax Michael Cuccarese 26d ago

Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis

Proposes epistemic blinding, an inference-time auditing protocol to separate memorized priors from data-driven inference in LLM-assisted agentic analysis systems.

Ax Elisabetta Rocchetti, Alfio Ferrara 26d ago

How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism

Investigates instruction-following mechanisms in LLMs through diagnostic probing, finding evidence for compositional skill deployment over universal mechanism.

Ax Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han 26d ago

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

Proposes ACE-Bench, agent evaluation benchmark with unified grid-based planning tasks, lightweight environments, and configurable difficulty/horizon control.

Ax Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li, Lingpeng Kong, Qi Liu, Zhifang Sui, Tong Yang 26d ago

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Introduces Claw-Eval, an end-to-end evaluation suite for autonomous agents addressing trajectory-opaque grading, safety, and interaction modality coverage.

Ax Song-Ju Kim 26d ago

Contextuality as an External Bookkeeping Cost under Fixed Shared-State Semantics

Theoretical analysis of contextuality in quantum information systems as external bookkeeping cost under classical simulation.

Ax Uday Allu, Sonu Kedia, Tanmay Odapally, Biddwan Ahmed 26d ago

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Proposes Web Retrieval-Aware Chunking (W-RAC) for efficient RAG document chunking to balance retrieval quality, latency, and cost on web-scale content.

Ax Jiaquan Zhang, Qigan Sun, Chaoning Zhang, Xudong Wang, Zhenzhen Huang, Yitian Zhou, Pengcheng Zheng, Chi-lok Andy Tai, Sung-Ho Bae, Zeyu Ma, Caiyan Qin, Jinyu Guo, Yang Yang, Hengtao Shen 26d ago

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

Proposes Task-Driven Alignment (TDA-RC) for improving reasoning chains in LLMs by bridging logical gaps between CoT and multi-round thought paradigms.

Ax Julian Coda-Forno, Jane X. Wang, Arslan Chaudhry 26d ago

The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse

Evaluates bidirectional training objectives (MLM, masked attention) to mitigate the reversal curse in autoregressive language models.

Ax Mohammad Reza Ghasemi Madani, Soyeon Caren Han, Shuo Yang, Jey Han Lau 26d ago

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

Introduces Inclusion-of-Thoughts (IoT), a strategy to reduce LLM instability on multiple-choice questions by filtering irrelevant distractors.

Ax Nitish Kumar, Sannu Kumar, S Akash, Manish Gupta, Ankith Karat, Sriparna Saha 26d ago

SUMMIR: A Hallucination-Aware Framework for Ranking Sports Insights from LLMs

Proposes SUMMIR framework for ranking sports insights extracted by LLMs, addressing hallucinations with 7,900-article dataset across four sports.

Ax Jos\'e Guilherme Marques dos Santos, Ricardo Yang, Rui Humberto Pereira, Alexandre Sousa, Br\'igida M\'onica Faria, Henrique Lopes Cardoso, Jos\'e Duarte, Jos\'e Lu\'is Reis, Lu\'is Paulo Reis, Pedro Pimenta, Jos\'e Paulo Marques dos Santos 26d ago

From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering

Evaluates four open-source PDF-to-Markdown conversion frameworks (Docling, MinerU, Marker, DeepSeek OCR) for RAG document preprocessing impact on QA accuracy.

Ax Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen 26d ago

Learning to Retrieve from Agent Trajectories

Studies how to design information retrieval systems for LLM agents versus humans, proposing learning-to-rank methods for agent trajectories.

Ax Muhammad Tahir Ashraf 26d ago

Synthetic Trust Attacks: Modeling How Generative AI Manipulates Human Decisions in Social Engineering Fraud

Analysis of how generative AI enables social engineering fraud and trust manipulation attacks in financial crime scenarios.

Ax Abhishek Dharmaratnakar, Srivaths Ranganathan, Debanshu Das, Anushree Sinha 26d ago

Generative AI for Video Trailer Synthesis: From Extractive Heuristics to Autoregressive Creativity

Surveys transition from heuristic-based to generative synthesis methods for automatic video trailer generation using LLMs and diffusion models.

Ax William Yicheng Zhu, Lei Zhu 26d ago

The Planetary Cost of AI Acceleration, Part II: The 10th Planetary Boundary and the 6.5-Year Countdown

Opinion piece on environmental and computational costs of scaling LLM agents and implications for planetary boundaries.

Ax Xinhong Xu, Yimeng Zhang, Qichen Qian, Yuanlong Zhang 26d ago

Self-Supervised Foundation Model for Calcium-imaging Population Dynamics

Self-supervised foundation model (CalM) trained on neuronal calcium traces for neuroscience task transfer learning.

Ax Sijun Dai, Qiang Huang, Xiaoxing You, Jun Yu 26d ago

MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation

Proposes MG²-RAG, a multi-granularity graph approach for retrieval-augmented generation in multimodal LLMs to improve cross-modal reasoning without costly text translation.

Ax Zimo Ji, Zongjie Li, Wenyuan Jiang, Yudong Gao, Shuai Wang 26d ago

Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto Mode

Independent evaluation of Claude Code's auto mode permission system for AI coding agents, testing security gates on ambiguous authorization scenarios.

Ax \'Ad\'am Kov\'acs 26d ago

Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents

Introduces Squeez, a method for pruning tool outputs in coding agents by identifying minimal relevant evidence blocks. Includes 11,477-example benchmark from SWE-bench.

Ax Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Yunzhi Yao, Xiangguo Sun, Yang Zhang 26d ago

CURE:Circuit-Aware Unlearning for LLM-based Recommendation

CURE enables privacy-preserving unlearning in LLM-based recommendation systems using circuit-aware techniques for removing user data.

Ax Yongchang Hao, Lili Mou 26d ago

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

Cactus improves speculative sampling for LLM decoding by relaxing strict distribution matching to allow acceptable variations like top-k sampling.

Ax Longsheng Zhou, Yu Shen 26d ago

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

Prune-Quantize-Distill pipeline for neural network compression optimizing wall-clock inference time rather than parameter count or FLOPs.

Ax Phongsakon Mark Konrad, Tim Lukas Adam, Riccardo Terrenzi, Serkan Ayvaz 26d ago

Architecture Without Architects: How AI Coding Agents Shape Software Architecture

Analysis of implicit architectural decisions made by AI coding agents, identifying five mechanisms and six prompt-architecture coupling patterns.

Ax Daniel Kuznetsov, Ofir Cohen, Karin Shistik, Rami Puzis, Asaf Shabtai 26d ago

FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment

FreakOut-LLM framework investigates whether emotionally charged prompts compromise safety alignment in ten LLMs using psychological stimuli.

Ax Rong Lu, Hao Liu, Song Hou 26d ago

Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges

Comparative evaluation of embedding-based and generative models for document classification, showing Vision-Language Models with CoT achieve 82% zero-shot accuracy.

Ax Kai Yu, Shuang Zhou, Yiran Song, Zaifu Zhan, Jie Peng, Kaixiong Zhou, Tianlong Chen, Feng Xie, Meng Wang, Huazhu Fu, Mingquan Lin, Rui Zhang 26d ago

PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities

PRIME enables multimodal self-supervised pretraining for cancer prognosis with missing modalities by combining histopathology, gene expression, and reports.

Ax Elias Calboreanu 26d ago

Closed-Loop Autonomous Software Development via Jira-Integrated Backlog Orchestration: A Case Study in Deterministic Control and Safety-Constrained Automation

Case study of closed-loop software development system managing backlog via deterministic pipeline with Jira integration and safety constraints.