Isolater - Feed

Ax Rebecca M. M. Hicke, Sil Hamilton, David Mimno, Ross Deans Kristensen-McLachlan 24d ago

Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries

Evaluates LLM performance on long-form text understanding by comparing human and model-generated novel summaries, assessing conceptual engagement patterns.

Ax Jonathan Nemitz, Carsten Eickhoff, Junyi Jessy Li, Kyle Mahowald, Michal Golovanevsky, William Rudman 24d ago

When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

Introduces Graded Color Attribution dataset to evaluate whether Vision-Language Models follow their own introspective reasoning rules compared to human behavior.

Ax Georgi Grazhdanski, Sylvia Vassileva, Ivan Koychev, Svetla Boytcheva 24d ago

Team Fusion@ SU@ BC8 SympTEMIST track: transformer-based approach for symptom recognition and linking

Transformer-based NER and entity linking approach for medical symptom recognition in SympTEMIST shared task using RoBERTa and SapBERT.

Ax Mingchen Zhuge, Changsheng Zhao, Haozhe Liu, Zijian Zhou, Shuming Liu, Wenyi Wang, Ernie Chang, Gael Le Lan, Junjie Fei, Wenxuan Zhang, Yasheng Sun, Zhipeng Cai, Zechun Liu, Yunyang Xiong, Yining Yang, Yuandong Tian, Yangyang Shi, Vikas Chandra, J\"urgen Schmidhuber 24d ago

Neural Computers

Proposes Neural Computers (NCs), a model architecture unifying computation, memory, and I/O as learned runtime state, aiming toward completely neural computing systems.

Ax Yi Xu, Philipp Jettkant, Laura Ruis 24d ago

The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning

Research on limits of latent reasoning in LLMs, testing whether models can discover multi-step planning strategies without supervision using graph path-finding tasks.

Ax Manuel Barusco, Francesco Borsatti, David Petrovic, Davide Dalle Pezze, Gian Antonio Susto 24d ago

Continual Visual Anomaly Detection on the Edge: Benchmark and Efficient Solutions

Benchmark and solutions for visual anomaly detection on edge devices with continual learning to adapt to evolving data distributions.

Ax Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Blake Gatto 24d ago

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Theoretical proof that no continuous wrapper defense can prevent all prompt injections in LLMs with connected prompt space, characterizing defense failure modes.

Ax Srinidhi Madabhushi, Pranesh Vyas, Swathi Vaidyanathan, Mayur Kurup, Elliott Nash, Yegor Silyutin 24d ago

From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures

Graph embedding-based anomaly detection system identifies under-represented services in microservice architectures using unsupervised learning.

Ax Mario Iacobelli, Adrian Robert Minut, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Iacopo Masi, Emanuele Rodol\`a 24d ago

Multi-objective Evolutionary Merging Enables Efficient Reasoning Models

Multi-objective evolutionary merging approach to reduce computational overhead of reasoning models while maintaining accuracy with fewer tokens.

Ax Afrah Gueriani, Hamza Kheddar, Ahmed Cherif Mazari 24d ago

Hybrid ResNet-1D-BiGRU with Multi-Head Attention for Cyberattack Detection in Industrial IoT Environments

Hybrid ResNet-1D-BiGRU-MHA model for intrusion detection in Industrial IoT systems achieving 98.71% accuracy on EdgeHoTset dataset.

Ax Dev Arpan Desai, Shaoyi Huang, Zining Zhu 24d ago

Distributed Interpretability and Control for Large Language Models

Practical implementation of activation-level interpretability and steering techniques for large language models distributed across multiple GPUs.

Ax David Cho, Yifan Wang, Fanping Sui, Ananth Grama 24d ago

Inference-Time Code Selection via Symbolic Equivalence Partitioning

Symbolic Equivalence Partitioning uses symbolic execution for inference-time code selection in LLM-based code generation without expensive verifiers.

Ax Maojiang Su, Po-Chung Hsieh, Weimin Wu, Mingcheng Lu, Jiunhau Chen, Jerry Yao-Chieh Hu, Han Liu 24d ago

Discrete Flow Matching Policy Optimization

DoMinO framework unifies reinforcement learning fine-tuning of Discrete Flow Matching models by reformulating sampling as a multi-step MDP.

Ax Weiyue Li, Ruizhi Qian, Yi Li, Yongce Li, Yunfan Long, Jiahui Cai, Yan Luo, Mengyu Wang 24d ago

MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts

MedConclusion benchmark dataset of 5.7M PubMed abstracts for evaluating LLMs on biomedical conclusion generation from structured evidence.

Ax Mohammed Nowaz Rabbani Chowdhury, Kaoutar El Maghraoui, Hsinyu Tsai, Naigang Wang, Geoffrey W. Burr, Liu Liu, Meng Wang 24d ago

Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees

Efficient quantization method for Mixture-of-Experts models with theoretical generalization guarantees to reduce inference memory overhead.

Ax Puja Saha, Eranga Ukwatta 24d ago

Adaptive Differential Privacy for Federated Medical Image Segmentation Across Diverse Modalities

Adaptive differential privacy approach for federated medical image segmentation across diverse imaging modalities and clinical sites.

Ax Basil Kyriacou, Mo Kordzanganeh, Maniraman Periyasamy, Alexey Melnikov 24d ago

Soft-Quantum Algorithms

Soft-Quantum Algorithms explores classical simulation of variational quantum circuits for few-qubit problems with large datasets.

Ax Yinghan Hou, Zongyou Yang 24d ago

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

SkillSieve is a three-layer detection framework for identifying security vulnerabilities in AI agent skills, addressing both code and natural language prompt injection attacks.

Ax Audrey Cheng, Harald Ng, Aaron Kabcenell, Peter Bailis, Matei Zaharia, Lin Ma, Xiao Shi, Ion Stoica 24d ago

AI-Driven Research for Databases

AI-Driven Research for Systems uses LLMs to automate database performance optimization through automated code generation instead of manual design.

Ax Joshua Castillo, Ravi Mukkamala 24d ago

LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources

Guardian Parser Pack uses LLMs to parse and normalize heterogeneous investigative documents for missing-person cases with varying layouts and data quality.

Ax Maotian Ma, Zheni Zeng, Zhenghao Liu, Yukun Yan 24d ago

Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs

SciDC method reduces LLM hallucination by incorporating scientific knowledge and rules as decoding constraints to improve reliability.

Ax Nan Zhang, Zishuo Wang, Shuyu Huang, Georgios Diamantopoulos, Nikos Tziritas, Panagiotis Oikonomou, Georgios Theodoropoulos 24d ago

TwinLoop: Simulation-in-the-Loop Digital Twins for Online Multi-Agent Reinforcement Learning

TwinLoop framework uses simulation-in-the-loop digital twins for online multi-agent reinforcement learning to adapt policies when operating conditions change.

Ax Hanyang Wang, Mingxuan Zhu 24d ago

The Detection--Extraction Gap: Models Know the Answer Before They Can Say It

Research finding that 52-88% of chain-of-thought tokens in reasoning models are generated after the answer is already recoverable, revealing a detection-extraction gap in model behavior.

Ax Mingyu Yang, Wentao Li, Wei Wang 24d ago

CubeGraph: Efficient Retrieval-Augmented Generation for Spatial and Temporal Data

CubeGraph: efficient retrieval-augmented generation system for hybrid queries combining vector similarity search with spatio-temporal filters for RAG workloads.

Ax Evgeny Skvortsov, Yilin Xia, Ojaswa Garg, Shawn Bowers, Bertram Lud\"ascher 24d ago

Logical Robots: Declarative Multi-Agent Programming in Logica

Logical Robots: declarative multi-agent programming platform using logic programming language Logica for robot behavior specification combining reactive control and planning.

Ax Zheng Jiang, Nan He, Yiming Chen, Lifeng Sun 24d ago

SubFLOT: Submodel Extraction for Efficient and Personalized Federated Learning via Optimal Transport

SubFLOT: Federated learning method using optimal transport for efficient submodel extraction, addressing heterogeneity and enabling client-side personalization.

Ax Zhengyang Ai, Zikang Shan, Xiaodong Ai, Jingxian Tang, Hangkai Hu, Pinyan Lu 24d ago

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

SHAPE: Framework for improving LLM reasoning through process supervision, formalizing reasoning as state-space trajectory with stage-aware advantage estimation.

Ax Cheng Peng, Mengxian Lyu, Ziyi Chen, Yonghui Wu 24d ago

A Parameter-Efficient Transfer Learning Approach through Multitask Prompt Distillation and Decomposition for Clinical NLP

Parameter-efficient multitask prompt distillation framework for clinical NLP adapting shared metaprompts across diverse medical tasks.

Ax Xiaoyou Qin, Zhihong Li, Xiaoxiao Cheng 24d ago

Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach

Audience segmentation approach for LLM-based social simulation restoring demographic heterogeneity in behavioral modeling.

Ax Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, Yi Chang 24d ago

A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM

Fake news detection framework combining graph analysis with LLM-retrieved evidence for explainable veracity assessment.

Ax Kourosh Shahnazari, Seyed Moein Ayyoubzadeh, Mohammadali Keshtparvar 24d ago

Between Century and Poet: Graph-Based Lexical Semantic Change in Persian Poetry

Graph-based analysis of semantic change in Persian poetry across centuries using aligned word embeddings.

Ax Xuanle Zhao, Xinyuan Cai, Xiang Cheng, Xiuyi Chen, Bo Xu 24d ago

ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding

Chemical vision-language model emphasizing reasoning over perception for understanding molecular reactions and mechanisms.

Ax Md Aminur Hossain, Ayush V. Patel, Siddhant Gole, Sanjay K. Singh, Biplab Banerjee 24d ago

HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation

Hybrid quantum-classical network for remote sensing image segmentation combining multi-scale feature fusion.

Ax Hong Yi Lin, Chunhua Liu, Haoyu Gao, Patanamon Thongtanunam, Christoph Treude 24d ago

Fine-grained Approaches for Confidence Calibration of LLMs in Automated Code Revision

Confidence calibration methods for LLM-generated code revisions enabling developers to assess output correctness at instance-level.

Ax Helen Yuliana Angmalisang, Frank Neumann 24d ago

The Traveling Thief Problem with Time Windows: Benchmarks and Heuristics

Traveling thief problem variant with time window constraints, benchmarks, and heuristics for multi-component optimization.

Ax Zhenyu Wang, Weichen Cheng, Weijia Li, Junjie Mou, Zongyou Zhao, Guoying Zhang 24d ago

URMF: Uncertainty-aware Robust Multimodal Fusion for Multimodal Sarcasm Detection

Multimodal fusion method for sarcasm detection addressing unreliable modalities through uncertainty-aware weighting.

Ax Yiquan Wu, Yuhang Liu, Yifei Liu, Ang Li, Siying Zhou, Kun Kuang 24d ago

Luwen Technical Report

Open-source Chinese legal language model built on Baichuan foundation using continued pretraining and instruction tuning.

Ax Ruida Hu, Xinchen Wang, Chao Peng, Cuiyun Gao, David Lo 24d ago

Evaluating LLM-Based 0-to-1 Software Generation in End-to-End CLI Tool Scenarios

CLI-Tool-Bench benchmark for evaluating LLM agents' end-to-end software generation from intent without predefined scaffolds.

Ax Xiangyu Wang, Jin Wu, Haoran Shi, Wei Xia, Jiarui Yu, Chanjin Zheng 24d ago

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks

Framework enabling multi-LLM collaboration with role-based team structure for solving complex multi-step contextualized tasks.

Ax Guillermo Gil de Avalle, Laura Maruster, Eric Sloot, Christos Emmanouilidis 24d ago

FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts

Pipeline for extracting procedural knowledge and directed graphs from maintenance flowchart images using vision-language models.

Ax Zhiyu Cao, Peifeng Li, Qiaoming Zhu 24d ago

Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search

Multi-faceted preference alignment approach for conversational query rewriting using feedback from retrieval and generation components.

Ax Xinchen Wang, Ruida Hu, Cuiyun Gao, Pengfei Gao, Chao Peng 24d ago

Evaluating Repository-level Software Documentation via Question Answering and Feature-Driven Development

Benchmark for evaluating LLM-generated repository documentation using question answering, addressing limitations of LLM-as-judge evaluation methods.

Ax Huy Q. Le, Loc X. Nguyen, Yu Qiao, Seong Tae Kim, Eui-Nam Huh, Choong Seon Hong 24d ago

FedDAP: Domain-Aware Prototype Learning for Federated Learning under Domain Shift

FedDAP addresses domain shift in federated learning using prototype learning for privacy-sensitive applications.

Ax Andrea Pollastro, Andrea Apicella, Francesco Isgr\`o, Roberto Prevete 24d ago

Instance-Adaptive Parametrization for Amortized Variational Inference

Instance-adaptive variational autoencoders reduce amortization gap in latent variable models for deep generative modeling.

Ax Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Dawei Yang 24d ago

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

MoBiE: binarization framework for efficient inference of mixture-of-experts LLMs using post-training quantization.

Ax Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, Wenke Huang 24d ago

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

SkillTrojan: backdoor attack framework targeting skill-based agent systems through malicious skill implementations.

Ax Dihong Jiang, Ruoqi Cao, Zhiyuan Dang, Li Huang, Qingsong Zhang, Zhiyu Wang, Shihao Piao, Shenggao Zhu, Jianlong Chang, Zhouchen Lin, Qi Tian 24d ago

OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale

OmniTabBench: largest tabular data benchmark comparing GBDTs, neural networks, and foundation models at scale.

Ax Paula Dodig, Boshko Koloski, Katarina Sitar \v{S}u\v{s}tar, Senja Pollak, Matthew Purver 24d ago

Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models

ESG sentiment analysis dataset and models for Slovene news, addressing corporate performance assessment in emerging markets.

Ax Jiang Zhou, Yunhao Wang, Xing Wu, Tinghao Yu, Feng Zhang 24d ago

WRAP++: Web discoveRy Amplified Pretraining

WRAP++ improves LLM pretraining through synthetic data rephrasing that captures cross-document relationships and associative context.

Ax Jeongho Yoon, Chanhee Park, Yongchan Chun, Hyeonseok Moon, Heuiseok Lim 24d ago

Towards Privacy-Preserving Large Language Model: Text-free Inference Through Alignment and Adaptation

Privacy-preserving LLM inference method enabling text-free processing through alignment and adaptation, reducing privacy risks without computational overhead.