Isolater - Feed

Ax Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram 4/28/2026

One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness

Study demonstrating instruction-tuned LLMs collapse helpfulness under simple lexical constraints like banning single characters/words.

Ax \`Alex R. Atrio, Antonio Lopez, Jino Rohit, Yassine El Ouahidi, Marcello Politi, Vijayasri Iyer, Umar Jamil, S\'ebastien Brati\`eres, Nicolas Long\'ep\'e 4/28/2026

EVE: A Domain-Specific LLM Framework for Earth Intelligence

Open-source domain-specialized 24B LLM (EVE) for Earth Intelligence built on Mistral, optimized for Earth observation and sciences reasoning.

Ax Shi Feng, Hanlin Zhang, Fan Nie, Sham Kakade, Yiling Chen 4/28/2026

Peer-Predictive Self-Training for Language Model Reasoning

Label-free self-training framework where multiple LLMs collaboratively improve by aggregating cross-model responses without external supervision.

Ax Danae S\'anchez Villegas, Samuel Lewis-Lim, Nikolaos Aletras, Desmond Elliott 4/28/2026

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

Analysis of reasoning dynamics in 18 vision-language models, measuring how visual and textual information integrates during inference.

Ax Seyedreza Mohseni, Sarvesh Baskar, Edward Raff, Manas Gaur 4/28/2026

Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks

Empirical study of Chain-of-Thought prompting for code deobfuscation using LLMs with step-by-step reasoning.

Ax Chongsheng Zhang, Hao Wang, Zelong Yu, Esteban Garces Arias, Julian Rodemann, Zhanshuo Zhang, Qilong Li, Gaojuan Fan, Krikamol Muandet, Christian Heumann 4/28/2026

Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration

LLM-based framework with Bayesian feedback for synthesizing rare relational/tabular data with quality optimization.

Ax Yuhe Wu, Guangyu Wang, Yuran Chen, Jiatong Zhang, Yutong Zhang, Yujie Chen, Jiaming Shang, Guang Zhang, Zhuang Liu 4/28/2026

PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Benchmark for analyzing sources of hallucination in LLMs across reasoning, instruction, and source memory components.

Ax Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang 4/28/2026

Towards Disentangled Preference Optimization Dynamics Beyond Likelihood Displacement

Analysis of preference optimization objectives for LLM alignment, proposing incentive-score decomposition to prevent likelihood displacement.

Ax Nicholas Thumiger, Andrea Bartezzaghi, Mattia Rigotti, Cezary Skura, Thomas Frick, Elisa Serioli, Fabrizio Arbucci, A. Cristiano I. Malossi 4/28/2026

Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD

Neural surrogate models trained on CFD data for faster aerodynamic design space exploration.

Ax Zhiyuan Jiang, Weihao Hong, Xinlei Guan, Tejaswi Dhandu, Miles Q. Li, Meng Xu, Kuan Huang, Umamaheswara Rao Tida, Bingyu Shen, Daehan Kwak, Boyang Li 4/28/2026

LLM-as-Judge Framework for Evaluating Tone-Induced Hallucination in Vision-Language Models

Framework using LLMs as judges to evaluate hallucination in vision-language models under varying prompt intensity.

Ax Shuai Wu, Xue Li, Yanna Feng, Yufang Li, Zhijun Wang, Ran Wang 4/28/2026

The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models

Systematic analysis of repetitive linguistic patterns in aligned LLMs across frontier models like GPT-4 and Claude.

Ax Chengyun Wang, Liwei Chen, Nils Thuerey 4/28/2026

A neural operator framework for data-driven discovery of stability and receptivity in physical systems

Data-driven framework using neural operators for stability analysis in physical systems without requiring known equations.

Ax Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi 4/28/2026

AVISE: Framework for Evaluating the Security of AI Systems

AVISE open-source framework for identifying vulnerabilities and evaluating security of AI systems in critical domain deployments.

Ax Sepideh Abedini, M. Tamer \"Ozsu 4/28/2026

SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL

SQLyzr benchmark and evaluation platform for text-to-SQL models with comprehensive metrics beyond aggregate scores and insights into behavior across query types.

Ax Shihai Wang, Tao Chen 4/28/2026

Conjecture and Inquiry: Quantifying Software Performance Requirements via Interactive Retrieval-Augmented Preference Elicitation

IRAP framework using retrieval-augmented preference elicitation to quantify vague natural language software performance requirements into mathematical forms.

Ax Vishal Rajput 4/28/2026

Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair

Theoretical analysis proving supervised learning has fundamental geometric blind spots in adversarial robustness, with minimal repair strategies proposed.

Ax Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan 4/28/2026

Building a Precise Video Language with Human-AI Oversight

Open datasets and benchmarks for video-language model captioning using structured specifications and professional video creator input for precise visual primitive definition.

Ax Grigory Sapunov 4/28/2026

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

Study of learned memory tokens in single-block Universal Transformers with Adaptive Computation Time, showing memory tokens empirically necessary for combinatorial reasoning.

Ax Nalin Poungpeth, Nicholas Clark, Tanu Mitra 4/28/2026

Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations

Audit measuring persuasiveness of LLMs in everyday conversations, finding models outperform humans and influence user decisions on relationships and medical matters.

Ax Yi Liu 4/28/2026

The Spectral Lifecycle of Transformer Training: Transient Compression Waves, Persistent Spectral Gradients, and the Q/K--V Asymmetry

Systematic study of transformer weight matrix singular value spectra during pretraining, discovering transient compression waves and spectral gradient phenomena across model scales.

Ax Cheng Gao, Cheng Huang, Kangyang Luo, Ziqing Qiao, Shuzheng Si, Huimin Chen, Chaojun Xiao, Maosong Sun 4/28/2026

KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning

Introduces KARL framework using knowledge-boundary-aware reinforcement learning to reduce LLM hallucinations through appropriate abstention without sacrificing accuracy.

Ax Zahra Makki Nayeri, Mohsen Rezvani 4/28/2026

BiTA: Bidirectional Gated Recurrent Unit-Transformer Aggregator in a Temporal Graph Network Framework for Alert Prediction in Computer Networks

Presents BiTA, a bidirectional GRU-Transformer temporal graph network for alert prediction in computer networks with multi-scale temporal patterns.

Ax Anastasiia Filippova, David Grangier, Marco Cuturi, Jo\~ao Monteiro 4/28/2026

Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing

Proposes Stochastic KV Routing for adaptive depth-wise KV cache sharing to reduce memory requirements in transformer serving along the layer dimension.

Ax Irene Tenison, Stella Ahn, Miriam Kim, Ebtisam Alshehri, Lalana Kagal 4/28/2026

Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation

Demonstrates parameter efficiency does not ensure memory efficiency in PEFT; shows LoRA and IA3 still require memory scaling with sequence length for on-device use.

Ax Stela Tong, Elai Ben-Gal 4/28/2026

CoFi-PGMA: Counterfactual Policy Gradients under Filtered Feedback for Multi-Agent LLMs

Proposes CoFi-PGMA for training multi-agent LLM systems using counterfactual policy gradients to handle routing and collaboration feedback filtering.

Ax Archit Thorat 4/28/2026

AutoCompress: Critical Layer Isolation for Efficient Transformer Compression

Presents AutoCompress, transformer compression method that isolates and protects Layer 0 at full dimensionality while compressing intermediate layers.

Ax Maurice Funk, Daumantas Kojelis 4/28/2026

Towards Understanding the Expressive Power of GNNs with Global Readout

Analyzes expressive power of message-passing Graph Neural Networks with global readout, focusing on first-order logic properties they can express.

Ax Elias Hossain, Mohammad Jahid Ibna Basher, Ivan Garibay, Ozlem Garibay, Niloofar Yousefi 4/28/2026

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

Studies deployment-time adaptation for frozen offline RL policies using Product-of-Experts composition when retraining is infeasible.

Ax Xin Wang, Chi Ma, Shaobin Chen, Pu Wang, Menglei Zhou, Junyi Qiu, Qiaorui Chen, Jiayu Sun, Shijie Liu, Zehuan Wang, Lei Yu, Chuan Liu, Fei Jiang, Wei Lin, Hao Wang, Jiawei Jiang, Xiao Yan 4/28/2026

MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches

Proposes MTServe, a hierarchical KV cache management system for efficient serving of generative recommendation models with user history encoding.

Ax Jinming Yang, Chuxian Qiu, Zhenyu Deng, Xinshan Jiao, Tao Zhou 4/28/2026

Quantifying and Mitigating Self-Preference Bias of LLM Judges

Quantifies and proposes mitigation strategies for Self-Preference Bias in LLM-as-a-Judge evaluation systems used for model alignment and leaderboards.

Ax A. Yermekov, D. A. Herrera-Mart\'i 4/28/2026

StackFeat RL: Reinforcement Learning over Iterative Dual Criterion Feature Selection for Stable Biomarker Discovery

Introduces StackFeat-RL, a reinforcement learning meta-learning framework for feature selection in high-dimensional genomic data with stability constraints.

Ax Minghui Xu, Qi Luo, Kun Li 4/28/2026

Utility-Aware Data Pricing: Token-Level Quality and Empirical Training Gain for LLMs

Presents utility-based data valuation framework for LLMs using token-level quality metrics and empirical training gains instead of static row-count approaches.

Ax Alex Nikulkov 4/28/2026

Reward Models Are Secretly Value Functions: Temporally Coherent Reward Modeling

Proposes Temporally Coherent Reward Modeling (TCRM) for RLHF, training reward models to score intermediate tokens rather than final outputs to capture richer training signals.

Ax Jeremy Ellis 4/28/2026

On-Device Vision Training, Deployment, and Inference on a Thumb-Sized Microcontroller

Complete ML pipeline for training and inference on microcontroller devices, including CNN training with Adam and real-time inference on $15-40 hardware.

Ax Natanael Alpay, Emeric Battaglia 4/28/2026

Complex SGD and Directional Bias in Reproducing Kernel Hilbert Spaces

Extends SGD and gradient descent to complex-valued parameters in reproducing kernel Hilbert spaces with analysis of directional bias.

Ax Haoze He, Xingyuan Ding, Xuan Jiang, Xinkai Zou, Alex Cheng, Yibo Zhao, Juncheng Billy Li, Heather Miller 4/28/2026

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

Method for fine-tuning Mixture-of-Experts models that preserves expert diversity and prevents router collapse during supervised fine-tuning.

Ax Kennon Stewart 4/28/2026

Shape of Memory: a Geometric Analysis of Machine Unlearning in Second-Order Optimizers

Analyzes machine unlearning in second-order optimizers, comparing their ability to handle data deletion tasks with varying eigendecomposition approaches.

Ax Weimin Huang, Natalie M. Isenberg, J\'an Drgo\v{n}a, Draguna L Vrabie, Bistra Dilkina 4/28/2026

ML-Guided Primal Heuristics for Mixed Binary Quadratic Programs

ML-guided heuristics accelerating solvers for mixed binary quadratic combinatorial optimization problems.

Ax Zixuan Xia, Quanxi Li 4/28/2026

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Kalman filter-based reward estimation method for policy gradient RL as alternative to reward normalization.

Ax Rui Gao, Youngseung Jeon, Swastik Roy, Morteza Ziyadi, Xiang 'Anthony' Chen 4/28/2026

C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs

C-Moral uses RL post-training to align LLMs for controllable multi-objective molecular optimization with competing drug-design constraints.

Ax Charles Xu, Jost Tobias Springenberg, Michael Equi, Ali Amin, Adnan Esmail, Sergey Levine, Liyiming Ke 4/28/2026

RL Token: Bootstrapping Online RL with Vision-Language-Action Models

RL Token enables sample-efficient online RL fine-tuning of vision-language-action models using lightweight adaptation for robotics manipulation.

Ax Yizheng Huang, Wenjun Zeng, Aditi Kumaresan, Zi Wang 4/28/2026

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

ProEval uses Gaussian Process surrogates for efficient performance estimation and proactive failure discovery in generative AI evaluation.

Ax Qishi Zhan, Minxuan Hu, Guansu Wang, Jiaxin Liu, Liang He 4/28/2026

Unstable Rankings in Bayesian Deep Learning Evaluation

Shows method rankings in Bayesian deep learning are unreliable and dataset-dependent under data scarcity conditions.

Ax Wugeng Zheng, Ziwen Kan, Katie Wang, Chen Chen, Song Wang 4/28/2026

Conditional Imputation for Within-Modality Missingness in Multi-Modal Federated Learning

CondI framework for handling within-modality missingness in multimodal federated learning via conditional imputation.

Ax Qishi Zhan, Minxuan Hu, Liang He, Guansu Wang, Jiaxin Liu 4/28/2026

A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning

Bayesian deep learning evaluation fails with single-seed benchmarks in limited-data settings; CRPS variance trajectories differ substantially across methods.

Ax William Feng, Ethan Lou, Aryan Sharma 4/28/2026

Surface Sensitivity in Lean 4 Autoformalization

Studies surface-level vs semantic sensitivity in Lean 4 autoformalization, testing GPT and open-weight models on paraphrase variations.

Ax Abhimanyu Bambhaniya, Geonhwa Jeong, Jason Park, Jiecao Yu, Jaewon Lee, Pengchao Wang, Changkyu Kim, Chunqiang Tang, Tushar Krishna 4/28/2026

Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns

Scaling Mixture-of-Experts inference across multiple nodes by exploiting expert activation patterns to address load imbalance and token routing bottlenecks.

Ax Terry Gou, Puneet Gupta 4/28/2026

Efficient VQ-QAT and Mixed Vector/Linear quantized Neural Networks

Three techniques for vector quantization-based model weight compression including cosine similarity assignment and straight-through estimators.

Ax Yaru Liu, Michael K. Ng, Yiqi Gu 4/28/2026

A Layer Separation Optimization Framework for Cross-Entropy Training in Deep Learning

Layer separation optimization framework to reduce nonconvexity in deep neural networks trained with softmax cross-entropy loss.

Ax Long Jing, Zhixiong Yang, Yajun Zhang, Xinlong Feng 4/28/2026

Contrastive Learning for Multimodal Human Activity Recognition with Limited Labeled Data

Contrastive learning approach for human activity recognition from multimodal sensor data with limited labeled data.