Isolater - Feed

Ax Wangsong Yin, Daliang Xu, Mengwei Xu, Gang Huang, Xuanzhe Liu 3d ago

ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference

ShadowNPU system co-design for efficient on-device LLM inference on NPUs, addressing quantization sensitivity in attention operators.

Ax Phongsakon Mark Konrad, Andrei-Alexandru Popa, Yaser Sabzehmeidani, Liang Zhong, Madhulika Tripathy, Andrei Constantinescu, Elisa A. Liehn, Serkan Ayvaz 3d ago

Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

Benchmarking study of deep learning segmentation models for carotid artery structures in histopathological images with limited datasets.

Ax Tao Long, Xuanming Zhang, Sitong Wang, Zhou Yu, Lydia B Chilton 3d ago

DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow

DoubleAgents system for human-agent alignment in coordination tasks using a coordination agent and dashboard for preference elicitation and feedback.

Ax Miao Jing, Mengting Jia, Junling Lin, Zhongxia Shen, Huan Gao, Mingkun Xu, Shangyang Li 3d ago

Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

Neural-MedBench reasoning-intensive benchmark for evaluating clinical reasoning ability of vision-language models beyond classification accuracy.

Ax Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal, Siddharth Roheda 3d ago

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Vid-Freeze defense mechanism against malicious image-to-video generation using temporal freezing adversarial techniques.

Ax Zhimeng Luo, Lixin Wu, Adam Frisch, Daqing He 3d ago

Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

MedIRT psychometric framework for evaluating LLM medical competency rather than benchmark-specific performance using Item Response Theory.

Ax Vincent Grari, Tim Arni, Thibault Laugel, Sylvain Lamprier, James Zou, Marcin Detyniecki 3d ago

ACT: Agentic Classification Tree

ACT system combines decision trees with LLMs to provide transparent, interpretable, and auditable AI decisions on unstructured data.

Ax Zhiping Zhang, Yi Evie Zhang, Freda Shi, Tianshi Li 3d ago

Autonomy Reshapes How Personalization Affects Privacy Concerns and Trust in LLM Agents

Study of how autonomy levels in LLM agents affect user privacy concerns and trust, with implications for personalization design.

Ax Haotian Wu, Shufan Jiang, Chios Chen, Yiyang Feng, Hehai Lin, Heqing Zou, Yao Shu, Chengwei Qin 3d ago

FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

FURINA-Builder multi-agent pipeline for automatically constructing customizable role-playing benchmarks at scale for evaluating LLM agent behavior.

Ax Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev 3d ago

Fewer Weights, More Problems: A Practical Attack on LLM Pruning

Security analysis of LLM pruning methods showing vulnerabilities in popular inference engines like vLLM when models are pruned before deployment.

Ax Vijay M. Galshetwar, Praful Hambarde, Prashant W. Patil, Akshay Dudhane, Sachin Chaudhary 3d ago

Clear Roads, Clear Vision: Advancements in Multi-Weather Restoration for Smart Transportation

Survey of image and video restoration techniques for adverse weather conditions in intelligent transportation systems and autonomous driving.

Ax Muhammad Junaid Asif, Abdul Rehman, Asim Mehmood, Rana Fayyaz Ahmad, Shazia Saqib 3d ago

Leveraging Wireless Sensor Networks for Real-Time Monitoring and Control of Industrial Environments

IoT and wireless sensor networks for industrial monitoring and control using NRF transceivers and Arduino microcontrollers.

Ax Shinwoo Park, Hyejin Park, Hyeseon Ahn, Yo-Sub Han 3d ago

A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Watermarking technique for LLMs using syntactic predictability to balance text quality against detection robustness for governance and trustworthiness.

Ax Xingrui Wang, Jiang Liu, Chao Huang, Xiaodong Yu, Ze Wang, Ximeng Sun, Jialian Wu, Alan Yuille, Emad Barsoum, Zicheng Liu 3d ago

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

XModBench benchmark measures cross-modal consistency and modality-specific biases in omni-modal large language models across audio, vision, and text.

Ax Gao Yang, Yuhang Liu, Siyu Miao, Xinyue Liang, Zhengyang Liu, Heyan Huang 3d ago

LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation

Game-theoretic framework for evaluating LLMs on subjective and open-ended tasks beyond fixed-format benchmarks with reference answers.

Ax Chun Chet Ng, Zhen Hao Chu, Jia Yu Lim, Yin Yin Boon, Wei Zeng Low, Jin Khye Tan 3d ago

AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

Application of AI to bank statement analysis for credit scoring of Malaysian MSMEs using alternative data sources instead of traditional credit bureau data.

Ax Mengqi Li, Lei Zhao, Anthony Man-Cho So, Ruoyu Sun, Xiao Li 3d ago

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

SePT method enables LLMs to improve reasoning without external rewards by alternating between self-generating responses and fine-tuning on those responses.

Ax Xi He, Sirui Lu, Bei Zeng 3d ago

Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems

Multi-agent system with Lean 4 verification layer for exact scientific discovery in quantum code design, combining symbolic synthesis and automated verification.

Ax Tong Ma, Hui Lai, Hui Wang, Zhenhu Tian, Chaochao Li, Fengjie Xu, Ling Fang 3d ago

ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

ATLAS framework combines LLMs with model-driven workflows for generating structured artifacts that satisfy schemas, domain rules, and audit requirements through constraint compilation and validation.

Ax Irina Proskurina, Marc-Antoine Carpentier, Julien Velcin 3d ago

HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

Interpretable model for detecting implicit and explicit hate speech using prototype-based representations for transfer learning.

Ax Qibing Ren, Zhijie Zheng, Jiaxuan Guo, Junchi Yan, Lizhuang Ma, Jing Shao 3d ago

When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

Research on financial fraud risks from collaborative LLM agents including MultiAgentFraudBench for simulating multi-agent fraud scenarios.

Ax Tommy Sha (Kindred), Zhan Cheng (Kindred), Haotian Zhai (Kindred), Xuwei Ding (Kindred), Junnan Li (Kindred), Haixiang Tang (Kindred), Zaoting Sun (Kindred), Yanchuan Tang (Kindred), Yongzhe (Kindred), Yi, Yuan Gao, Anhao Li 3d ago

FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis

Fairness-aware stroke diagnosis framework combining domain-adversarial training with group distributionally robust optimization.

Ax Md Tanvirul Alam, Saksham Aggarwal, Justin Yang Chae, Nidhi Rastogi 3d ago

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

Synthetic environment generating visual reasoning puzzles with ground-truth solutions across 25 task types for benchmark construction.

Ax Lingdong Wang, Guan-Ming Su, Divya Kothandaraman, Tsung-Wei Huang, Mohammad Hajiesmaili, Ramesh K. Sitaraman 3d ago

Low-Bitrate Video Compression through Semantic-Conditioned Diffusion

Video compression framework using semantic conditioning and diffusion models for ultra-low bitrate encoding.

Ax Qi'ao Xu, Tianwen Qian, Yuqian Fu, Kailing Li, Yang Jiao, Jiacheng Zhang, Xiaoling Wang, Liang He 3d ago

ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos

Benchmark for task-oriented spatio-temporal grounding in egocentric videos for embodied AI agents.

Ax Jia Hu, Zhexi Lian, Xuerun Yan, Ruiang Bi, Dou Shen, Yu Ruan, Chunlong Xia, Haoran Wang 3d ago

MPCFormer: A physics-informed data-driven approach for explainable socially-aware autonomous driving

Physics-informed transformer model for socially-aware autonomous driving that learns social interaction dynamics.

Ax Huiguo He, Pengyu Yan, Ziqi Yi, Weizhi Zhong, Zheng Liu, Yejun Tang, Huan Yang, Guanbin Li, Lianwen Jin 3d ago

ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention

Drag-based image editing method using diffusion models with token injection and attention mechanisms for precise visual manipulation.

Ax Junhyung Park, Yuqing Zhou 3d ago

A fine-grained look at causal effects in causal spaces

Theoretical framework for analyzing causal effects at fine-grained levels in high-dimensional data like images and language models.

Ax Chao Wen, Tung Phung, Pronita Mehrotra, Sumit Gulwani, Roger E. Beaty, Tomohiro Nagashima, Adish Singla 3d ago

Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Interface design study on scaffolding divergent and convergent thinking in human-AI co-creation with generative models.

Ax Minh V. T. Thai, Tue Le, Dung Nguyen Manh, Huy Phan Nhat, Nghi D. Q. Bui 3d ago

SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

SWE-EVO benchmark for evaluating AI coding agents on long-horizon software evolution tasks spanning multiple files and iterations.

Ax Qianli Wang, Van Bach Nguyen, Yihong Liu, Fedor Splitt, Nils Feldhus, Christin Seifert, Hinrich Sch\"utze, Sebastian M\"oller, Vera Schmitt 3d ago

Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation

Research on using LLMs to generate multilingual counterfactual examples for model interpretability across languages.

Ax Zihua Yang, Xin Liao, Yiqun Zhang, Yiu-ming Cheung 3d ago

Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models

LLM-based method for categorical data clustering that leverages semantic understanding to measure similarity among attribute values lacking inherent ordering.

Ax Sitong Wang, Anh Truong, Lydia B. Chilton, Dingzeyu Li 3d ago

Rewriting Video: Text-Driven Reauthoring of Video Footage

Text-driven video reauthoring interface and study exploring how creators can edit video footage through natural language prompts rather than manual editing.

Ax Rachmadita Andreswari, Stephan A. Fahrenkrog-Petersen, Jan Mendling 3d ago

Fairness in Healthcare Processes: A Quantitative Analysis of Decision Making in Triage

Analysis of fairness in automated decision-making for healthcare emergency triage using process mining and fairness-aware algorithms on empirical data.

Ax Shaofeng Yin, Jiaxin Ge, Zora Zhiruo Wang, Chenyang Wang, Xiuyu Li, Michael J. Black, Trevor Darrell, Angjoo Kanazawa, Haiwen Feng 3d ago

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

Vision-language agent framework combining inverse graphics with interleaved multimodal reasoning for reconstructing images into editable programs with spatial grounding.

Ax Daniel Ogenrwot, John Businge 3d ago

How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

Large-scale empirical study analyzing how AI coding agents modify code and describe changes in GitHub pull requests compared to human contributions.

Ax Viacheslav Sydora, Guner Dilsad Er, Michael Muehlebach 3d ago

Teaching Machine Learning Fundamentals with LEGO Robotics

Open-source web platform and course teaching machine learning fundamentals to students aged 12-17 using LEGO robotics without programming.

Ax Ellen Xiaoqing Tan, Jack Lanchantin, Shehzaad Dhuliawala, Danwei Li, Thao Nguyen, Jing Xu, Ping Yu, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Xian Li, Olga Golovneva 3d ago

Self-Improving Pretraining: using post-trained models to pretrain better models

Method improving language model pretraining by using post-trained models as data sources to instill desired behaviors like safety and reasoning earlier in training.

Ax Henri A\"idasso, Francis Bordeleau, Ali Tizghadam 3d ago

Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models

Few-shot fine-tuned LLM approach for categorizing intermittent CI pipeline failures caused by flaky tests and infrastructure issues rather than code defects.

Ax Lv Tang, Tianyi Zheng, Bo Li, Xingyu Li 3d ago

InfoTok: Information-Theoretic Regularization for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs

Information-theoretic framework for optimizing shared visual tokenization in unified multimodal models that perform both image understanding and generation.

Ax Jiacheng Liang, Yuhui Wang, Tanqiu Jiang, Ting Wang 3d ago

RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models

Safety alignment approach for Mixture-of-Experts language models addressing unique challenges from sparse routing mechanisms during fine-tuning.

Ax Xin Wu, Zhixuan Liang, Yue Ma, Mengkang Hu, Zhiyuan Qin, Xiu Li 3d ago

ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

Benchmark framework for evaluating multimodal large language models on spatio-temporal bimanual coordination tasks requiring synchronized multi-stream integration.

Ax William Lugoloobi, Thomas Foster, William Bankes, Chris Russell 3d ago

LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

Method using linear probes on LLM pre-generation activations to predict success likelihood before generation, enabling selective deployment of expensive extended reasoning.

Ax Yi Feng, Chen Huang, Zhibo Man, Ryner Tan, Long P. Hoang, Shaoyang Xu, Wenxuan Zhang 3d ago

MoltNet: Understanding Social Behavior of AI Agents in the Agent-Native MoltBook

Study examining emergent social behavior and interactions among large-scale communities of AI agents on MoltBook, a social platform designed for agent-agent communication.

Ax Yuchen Yang, Wenze Lin, Enhao Huang, Zhixuan Chu, Hongbin Zhou, Lan Tao, Yiming Li, Zhan Qin, Kui Ren 3d ago

Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets

Token-level noise filtering framework for LLM fine-tuning datasets that identifies and explains problematic tokens to improve downstream task performance.

Ax Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Seunghoon Hong, Nicholas M. Boffi, Jinwoo Kim 3d ago

Flow Map Language Models: One-step Language Modeling via Continuous Denoising

Language model based on continuous flows over token embeddings demonstrating faster generation than discrete diffusion and autoregressive models with improved few-step quality.

Ax Delip Rao, Chris Callison-Burch 3d ago

Autorubric: Unifying Rubric-based LLM Evaluation

Open-source framework unifying rubric-based LLM evaluation techniques including ensemble judging, bias mitigation, and few-shot calibration with consistent implementation.

Ax Kihoon Son, Hyewon Lee, DaEun Choi, Yoonsu Kim, Tae Soo Kim, Yoonjoo Lee, John Joon Young Chung, HyunJoon Jung, Juho Kim 3d ago

"When to Hand Off, When to Work Together": Expanding Human-Agent Co-Creative Collaboration through Concurrent Interaction

Research on human-AI agent collaboration exploring how agents can maintain workspace awareness and interpret concurrent user actions on shared artifacts during co-creative tasks.

Ax Addison Kalanther, Sanika Bharvirkar, Shankar Sastry, Chinmay Maheshwari 3d ago

NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning

Research on multi-agent reinforcement learning algorithm (NePPO) addressing training stability and convergence in general-sum games with heterogeneous agents.

Ax Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M. Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, Anirudha Majumdar 3d ago

PlayWorld: Learning Robot World Models from Autonomous Play

PlayWorld pipeline training action-conditioned video models on autonomous robot play data for improved world model physics prediction.