Isolater - Feed

Ax Xinyu Wang, Hanwei Wu, Jingwei Song, Shuyuan Zhang, Jiayi Zhang, Fanqi Kong, Tung Sum Thomas Kwok, Xiao-Wen Chang, Yuyu Luo, Chenglin Wu, Bang Liu 1d ago

Co-Evolution of Policy and Internal Reward for Language Agents

Self-Guide method for co-evolving policy and internal reward in LLM agents, addressing sparse reward bottleneck in long-horizon training.

Ax Zahra Makki Nayeri, Mohsen Rezvani 1d ago

AlertStar: Path-Aware Alert Prediction on Hyper-Relational Knowledge Graphs

Knowledge graph completion approach for network alert prediction modeling cyber-attacks as hyper-relational statements.

Ax Zhangyun Tan, Zeliang Zhang, Susan Liang, Yolo Yunlong Tang, Lisha Chen, Chenliang Xu 1d ago

Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning

Benchmarking training-free unlearning methods for removing sensitive visual concepts from vision-language models.

Ax Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary, Yernat Yestekov, Zora Che, Mosh Levy, Elle Najt, Dennis Murphy, Prashant Kulkarni, Lev McKinney, Kei Nishimura-Gasparian, Ram Potham, Aengus Lynch, Michael L. Chen 1d ago

An Independent Safety Evaluation of Kimi K2.5

Safety evaluation of Kimi K2.5 open-weight LLM assessing CBRNE misuse, cybersecurity, alignment, and bias risks.

Ax Jinsook Lee, Kirk Vanacore, Zhuqian Zhou, Bakhtawar Ahtisham, Rene F. Kizilcec 1d ago

Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts

Domain-adapted RAG pipeline using fine-tuned embedding models for pedagogical dialogue act annotation without generative model fine-tuning.

Ax Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu, Wenjing Zhang, Xiang Wang, Shiguo Lian 1d ago

A Systematic Security Evaluation of OpenClaw and Its Variants

Systematic security evaluation of six OpenClaw-series AI agent frameworks identifying vulnerabilities in tool-augmented LLM agents.

Ax Ema Smolic, Mario Brcic, Luka Hobor, Mihael Kovac 1d ago

AI-Assisted Unit Test Writing and Test-Driven Code Refactoring: A Case Study

Case study of AI-assisted unit test writing and test-driven refactoring for improving legacy codebase maintainability.

Ax Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Tuney Zheng, Fanglin Xu, Weicheng Gu, Lin Jing, Yaxin Du, Joseph Li, Yizhi Li, Yan Xing, Chuan Hao, Ran Tao, Ruihao Gong, Aishan Liu, Zhoujun Li, Mingjie Tang, Chenghua Lin, Siheng Chen, Wayne Xin Zhao, Xianglong Liu, Ming Zhou, Bryan Dai, Weifeng Lv 1d ago

InCoder-32B-Thinking: Industrial Code World Model for Thinking

InCoder-32B-Thinking model trained with Error-driven Chain-of-Thought for industrial code generation with reasoning traces.

Ax Lihao Sun, Lewen Yan, Xiaoya Lu, Andrew Lee, Jie Zhang, Jing Shao 1d ago

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

Method for identifying valence-arousal emotion subspace in LLM representations using steering vectors and PCA.

Ax Prakhar Bansal, Shivangi Agarwal 1d ago

Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation

Survey of contextual enrichment strategies for LLMs from in-context prompting through retrieval-augmented generation and GraphRAG.

Ax Gengwei Zhang, Jie Peng, Zhen Tan, Mufan Qiu, Hossein Nourkhiz Mahjoub, Vaishnav Tadiparthi, Kwonjoon Lee, Yanyong Zhang, Tianlong Chen 1d ago

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

Analysis of hallucination effects in reinforcement learning post-training for multimodal LLMs, examining whether RL improves visual reasoning or merely exploits hallucinations.

Ax Nikita Vassilyev, William Berrios, Ruowang Zhang, Bo Han, Douwe Kiela, Shikib Mehri 1d ago

Reflective Context Learning: Studying the Optimization Primitives of Context Space

Research on optimization primitives in context space for AI agents, addressing credit assignment, overfitting, and learning signal challenges.

Ax Dipto Sumit, Ankan Kumar Roy, Sadia Khair Rodela, Atia Haque Asha, Mourchona Afrin, Niloy Farhan, Farig Yousuf Sadeque 1d ago

Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization

arXiv paper on multi-teacher knowledge distillation for low-resource abstractive summarization using inter-teacher agreement for supervision routing.

Ax Daniel C. MacRae, Luuk van der Hoek, Robert van der Wal, Suzanne P. M. de Vette, Hendrike Neh, Baoqiang Ma, Peter M. A. van Ooijen, Lisanne V. van Dijk 1d ago

PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction

arXiv paper introducing PR3DICTR, open-access PyTorch/MONAI framework for 3D medical image classification and outcome prediction.

Ax Van Sy Mai, Kushal Chakrabarti, Richard J. La, Dipankar Maity 1d ago

Enhancing Robustness of Federated Learning via Server Learning

arXiv paper on server learning with client filtering to improve federated learning robustness against malicious attacks.

Ax Yuqi Wu, Guangya Wan, Jingjing Li, Shengming Zhao, Lingfeng Ma, Tianyi Ye, Ion Pop, Yanbo Zhang, Jie Chen 1d ago

WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis

arXiv paper on WiseMind, multi-agent LLM framework inspired by Dialectical Behavior Therapy for reliable and empathetic psychiatric diagnosis.

Ax Beidan Liu, Zhengqiu Zhu, Chen Gao, Tianle Pu, Yong Zhao, Wei Qi, Quanjun Yin 1d ago

Learn to Relax with Large Language Models: Solving Constraint Optimization Problems via Bidirectional Coevolution

arXiv paper on AutoCO, LLM-based method coupling OR principles with bidirectional coevolution for complex constraint optimization problems.

Ax Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, Hari Balakrishnan 1d ago

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

arXiv paper on Glia, multi-agent LLM architecture for autonomous computer systems design using specialized agents with empirical feedback loops.

Ax Jiayu Liu, Cheng Qian, Zhaochen Su, Qing Zong, Shijue Huang, Bingxiang He, Yi R. Fung 1d ago

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

arXiv paper introducing CostBench benchmark for evaluating LLM tool-use agents on cost-optimal planning and adaptation in dynamic environments.

Ax Fanrui Zhang, Qiang Zhang, Sizhuo Zhou, Jianwen Sun, Chuanhao Li, Jiaxin Ai, Yukang Feng, Yujie Zhang, Wenjie Li, Zizhen Li, Yifan Chang, Jiawei Liu, Kaipeng Zhang 1d ago

Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection

arXiv paper on code-in-the-loop agentic tool use for image forgery detection, unifying low-level artifacts with semantic knowledge from MLLMs.

Ax Sixue Xing, Kerui Wu, Xuanye Xia, Meng Jiang, Jintai Chen, Tianfan Fu 1d ago

ClinicalReTrial: Clinical Trial Redesign with Self-Evolving Agents

arXiv paper on ClinicalReTrial, multi-agent system using LLMs to redesign failing clinical trial protocols with actionable recommendations.

Ax Jiayi Yuan, Jonathan N\"other, Natasha Jaques, Goran Radanovi\'c 1d ago

AgenticRed: Evolving Agentic Systems for Red-Teaming

arXiv paper on AgenticRed, automated pipeline using in-context learning to evolve red-teaming systems without human-designed workflows.

Ax Bowen Cao, Dongdong Zhang, Yixia Li, Junpeng Liu, Shijue Huang, Chufan Shi, Hongyuan Lu, Yaokang Wu, Guanhua Chen, Wai Lam, Furu Wei 1d ago

From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics

arXiv paper analyzing gap between LLM math benchmark performance and real-world application through contextual reasoning benchmark ContextMATH.

Ax A. Humnabadkar, A. Sikdar, B. Cave, H. Zhang, N. Bessis, A. Behera 1d ago

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

arXiv survey on autonomous driving using synthetic data and virtual environments for training and evaluation.

Ax Yang Li, Yule Liu, Xinlei He, Youjian Zhao, Qi Li, Ke Xu 1d ago

Chain-of-Authorization: Embedding authorization into large language models

arXiv paper on embedding authorization mechanisms directly into LLM reasoning to prevent data leakage and unauthorized command execution.

Ax Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins, Kristian Lum, Laura Weidinger 1d ago

Evaluating Language Models for Harmful Manipulation

arXiv paper introducing framework for evaluating harmful AI manipulation through human-AI interaction studies across policy, finance, and health domains.

Ax Zelin Tan, Zhouliang Yu, Bohan Lin, Zijie Geng, Hejia Geng, Yudong Zhang, Mulei Zhang, Yang Chen, Shuyue Hu, Zhenfei Yin, Chen Zhang, Lei Bai 1d ago

PAPO: Stabilizing Rubric Integration Training via Decoupled Advantage Normalization

arXiv paper proposing PAPO, integrating process-level evaluation into policy optimization to improve reasoning quality beyond final-answer correctness.

Ax Sha Li, Naren Ramakrishnan 1d ago

Experience as a Compass: Multi-agent RAG with Evolving Orchestration and Agent Prompts

arXiv paper on multi-agent RAG with adaptive orchestration and evolving agent prompts to handle complex multi-hop reasoning tasks.

Ax Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani 1d ago

Therefore I am. I Think

Analysis showing LLM reasoning models encode decisions before generating chain-of-thought explanations via linear probes.

Ax Khalid Adnan Alsayed 1d ago

When AI Gets it Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems

Study evaluating reliability and risk of AI systems in medication decision-making and healthcare workflows.

Ax Yash Shah, Abhijit Chakraborty, Naresh Kumar Devulapally, Vishnu Lokhande, Vivek Gupta 1d ago

OSCAR: Orchestrated Self-verification and Cross-path Refinement

OSCAR framework for mitigating hallucinations in diffusion language models using self-verification during generation.

Ax Tuyen Van Kieu, Chi Linh Hoang, Khanh Van To 1d ago

Solving the Two-dimensional single stock size Cutting Stock Problem with SAT and MaxSAT

SAT/MaxSAT framework for solving 2D cutting stock problem in manufacturing optimization.

Ax Sarath Shekkizhar, Romain Cosentino, Adam Earle 1d ago

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Research probing whether LLMs encode awareness of conversation continuity by generating user turns after assistant responses.

Ax Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, Yoshua Bengio 1d ago

Efficient Causal Graph Discovery Using Large Language Models

Novel framework using LLMs for causal graph discovery via breadth-first search, reducing query complexity from quadratic to linear.

Ax Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Yu Jiang, Yuheng Lu, Chen Zhang, Longbiao Wang, Jianwu Dang 1d ago

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS

Improves emotion intensity and speaker consistency in zero-shot LLM-based text-to-speech through expressive prompt design methods.

Ax Jiawei Liu, Fanrui Zhang, Jiaying Zhu, Esther Sun, Dong Li, Qiang Zhang, Zheng-Jun Zha 1d ago

ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization

Multimodal LLM fine-tuned for interpretable image forgery detection and localization providing semantic understanding beyond low-level artifacts.

Ax Yongxiang Liu, Bowen Peng, Li Liu, Xiang Li 1d ago

S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack

Proposes scale transformation method for transferable targeted adversarial attacks requiring minimal data without surrogate model feedback.

Ax Shin'ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa, Yasutoshi Ida 1d ago

Zero-shot Concept Bottleneck Models

Zero-shot concept bottleneck models enabling interpretable predictions without target task training by leveraging zero-shot learning.

Ax Minkyu Choi, S P Sharan, Harsh Goel, Sahil Shah, Sandeep Chinchali 1d ago

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

Improves text-to-video generation semantic and temporal consistency using neuro-symbolic feedback without retraining the model.

Ax Tianyou Li, Haijun Zou, Jiayuan Wu, Zaiwen Wen 1d ago

LMask: Learn to Solve Constrained Routing Problems with Lazy Masking

LMask framework uses dynamic masking with learning to solve constrained routing problems as combinatorial optimization tasks.

Ax Jialin Yang, Dongfu Jiang, Lipeng He, Sherman Siu, Yuxuan Zhang, Disen Liao, Zhuofeng Li, Huaye Zeng, Yiming Jia, Haozhe Wang, Benjamin Schneider, Chi Ruan, Wentao Ma, Zhiheng Lyu, Yifei Wang, Yi Lu, Quy Duc Do, Ziyan Jiang, Ping Nie, Wenhu Chen 1d ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

StructEval benchmark systematically evaluates LLM capabilities in generating structured outputs across JSON, HTML, React, SVG and other formats.

Ax Hao Yin, Lijun Gu, Paritosh Parmar, Lin Xu, Tianxiao Guo, Xiujin Liu, Weiwei Fu, Yang Zhang, Tianyou Zheng 1d ago

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

Introduces FLEX, multimodal multiview dataset for fitness action quality assessment with professional assessment and multiple sensor modalities.

Ax Xingzhong Fan, Hongming Tang, Yue Zeng, M. B. N. Kouwenhoven, Guangquan Zeng 1d ago