Isolater - Feed

Ax Danil Gorinevski (cybiont GmbH, Sch\"ubelbach, Switzerland) 27d ago

Nidus: Externalized Reasoning for AI-Assisted Engineering

arXiv paper on Nidus, a governance runtime using Claude, Gemini, Codex to mechanize V-model for AI-assisted software delivery.

Ax Firoj Alam, Gagan Bhatia, Sahinur Rahman Laskar, Shammur Absar Chowdhury 27d ago

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

arXiv paper proposing OmniScore, deterministic evaluation metrics for multilingual text generation as alternative to LLM judges.

Ax Amir M. Ebrahimi, Gopi Krishnan Rajbahadur 27d ago

Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks

arXiv paper auditing code-editing benchmarks for LLMs, finding flaws in existing evaluation methods for instructed code modification.

Ax Jorge Alberto Garza-Abdala, Gerardo A. Fumagal-Gonz\'alez, Eduardo de Avila-Armenta, Sadam Hussain, Jasiel H. Toscano-Mart\'inezb, Diana S. M. Rosales Gurmendi, Alma A. Pedro-P\'erez, Jose G. Tamez-Pena 27d ago

Simultaneous Dual-View Mammogram Synthesis Using Denoising Diffusion Probabilistic Models

arXiv paper on diffusion models for medical imaging, generating paired mammogram views for cancer screening datasets.

Ax Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Artyom Grishin, Igor Saprygin, Aleksandr Serkov, Mark Averchenko, Daniil Tikhonov, Maksim Zhdanov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Alexey Zemtsov, Vladislav Kurenkov 27d ago

Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

arXiv paper on Decision Pre-Trained Transformer for in-context reinforcement learning, enabling scalable generalist agent training.

Ax Zezhong Fan, Ziheng Chen, Luyi Ma, Jin Huang, Lalitesh Morishetti, Kaushiki Nag, Sushant Kumar, Kannan Achan 27d ago

CRAB: Codebook Rebalancing for Bias Mitigation in Generative Recommendation

arXiv paper on CRAB method for mitigating popularity bias in generative recommendation systems via codebook rebalancing.

Ax Quyet V. Do, Thinh Pham, Nguyen Nguyen, Sha Li, Pratibha Zunjare, Tu Vu 27d ago

$\pi^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models

arXiv paper presenting π² pipeline for curating reasoning data from structured sources to improve LLM long-context reasoning.

Ax Yuxuan Zhang, EunJeong Hwang, Huaisong Zhang, Penghui Du, Yiming Jia, Dongfu Jiang, Xuan He, Shenhui Zhang, Ping Nie, Peter West, Kelsey R. Allen 27d ago

Watch Before You Answer: Learning from Visually Grounded Post-Training

arXiv paper on vision-language models learning from grounded video data, finding text-only bias in video benchmarks.

Ax Ruslan Sharifullin, Maxim Gorshkov, Hannah Clay 27d ago

Offline RL for Adaptive Policy Retrieval in Prior Authorization

arXiv paper modeling prior authorization policy retrieval as MDP for adaptive decision-making in healthcare insurance.

Ax Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu 27d ago

Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning

arXiv paper on how reasoning evolves in language models through fine-tuning and RL, studied via chess task performance.

Ax Samira Hajizadeh, Suman Jana 27d ago

EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback

EffiPair: Relative Contrastive Feedback method for improving runtime and memory efficiency of LLM-generated code without model fine-tuning.

Ax Geert Trooskens (XY.AI Labs, Palo Alto, CA), Aaron Karlsberg (XY.AI Labs, Palo Alto, CA), Anmol Sharma (XY.AI Labs, Palo Alto, CA), Lamara De Brouwer (XY.AI Labs, Palo Alto, CA), Max Van Puyvelde (Stanford University School of Medicine, Stanford, CA), Matthew Young (XY.AI Labs, Palo Alto, CA), John Thickstun (Cornell University, Ithaca, NY), Gil Alterovitz (Brigham and Women's Hospital / Harvard Medical School, Boston, MA), Walter A. De Brouwer (Stanford University School of Medicine, Stanford, CA) 27d ago

Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation

Compiled AI: Paradigm where LLMs generate executable code during compilation for deterministic, model-free workflow automation execution.

Ax Alfonso Amayuelas, Firas Laakom, Piotr Pi\k{e}kos, Wenyi Wang, Yifan Xu, Yuhui Wang, J\"urgen Schmidhuber, William Wang 27d ago

Planning to Explore: Curiosity-Driven Planning for LLM Test Generation

Planning to Explore: Curiosity-driven planning approach for LLM-based test generation using Bayesian principles to reach deep code branches.

Ax Jonathan Ivey, Anjalie Field, Ziang Xiao 27d ago

What Makes a Good Response? An Empirical Analysis of Quality in Qualitative Interviews

Analysis of 10 proposed measures for evaluating qualitative interview response quality to determine predictive validity.

Ax Neharika Jali, Anupam Nayak, Gauri Joshi 27d ago

Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning

Adaptive Thinking Budgets: Method for allocating inference-time compute efficiently across multi-turn LLM reasoning based on turn difficulty.

Ax Mingjie Li, Edward Kim, Yue Zhao, Ehsan Adeli, Kilian M. Pohl 27d ago

Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

Modality-aware vector-quantized VAE for reconstructing multimodal brain MRI data across different imaging modalities.

Ax Zhengqin Li, Cheng Zhang, Jakob Engel, Zhao Dong 27d ago

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

Large Sparse Reconstruction Model studies scaling transformer context windows for improved 3D object reconstruction from multiple views.

Ax Ali Aliev, Kamil Garifullin, Nikolay Yudin, Vera Soboleva, Alexander Molozhavenko, Ivan Oseledets, Aibek Alanov, Maxim Rakhuba 27d ago

OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Models

OrthoFuse: Training-free method for merging multiple adapters in diffusion models using Riemannian geometry.

Ax Ziyi Chen, Mengxian Lyu, Cheng Peng, Yonghui Wu 27d ago

Improving Clinical Trial Recruitment using Clinical Narratives and Large Language Models

Study comparing encoder and decoder-based LLMs for screening clinical narratives to automate patient recruitment for clinical trials.

Ax Yi Ru Wang, Carter Ung, Evan Gubarev, Christopher Tan, Siddhartha Srinivasa, Dieter Fox 27d ago

RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

RoboPlayground: Framework for democratizing robotic manipulation evaluation through structured physical domain benchmarks.

Ax Anas Jnini, Elham Kiyani, Khemraj Shukla, Jorge F. Urban, Nazanin Ahmadi Daryakenari, Johannes Muller, Marius Zeinhofer, George Em Karniadakis 27d ago

Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks

Optimization strategies using curvature-aware methods to improve convergence speed and accuracy of physics-informed neural networks.

Ax Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang 27d ago

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

XMark: Multi-bit watermarking method for embedding imperceptible messages in LLM-generated text for attribution and tracing.

Ax Jon-Paul Cacioli 27d ago

Exemplar Retrieval Without Overhypothesis Induction: Limits of Distributional Sequence Learning in Early Word Learning

Study on how transformer language models learn second-order generalizations about object categories from synthetic data.

Ax Umang Dobhal, Christina Garcia, Sozo Inoue 27d ago

Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation

Temporal extension of TabDDPM for time-series data generation, addressing temporal dependencies in diffusion-based synthetic data creation.

Ax Chan-Wei Hu, Zhengzhong Tu 27d ago

Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking

Region-based re-ranker for multi-modal RAG reducing visual distractors by formulating region selection as decision-making problem.

Ax Pardis Taghavi, Santosh Bhavani 27d ago

Spec Kit Agents: Context-Grounded Agentic Workflows

Multi-agent spec-driven development pipeline with context-grounding hooks to prevent hallucinations and architectural violations in LLM coding agents.

Ax Dominik Blain, Maxime Noiseux 27d ago

Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code

Formal verification of security vulnerabilities in AI-generated code across 7 frontier LLMs and 500 prompts using Z3 SMT solver.

Ax Junyu Guo, Shangding Gu, Ming Jin, Costas Spanos, Javad Lavaei 27d ago

LLMs Should Express Uncertainty Explicitly

Study on training LLMs to express uncertainty explicitly as control interface for abstention and verification tasks.

Ax Boyu Cao, Lekai Qian, Dehan Li, Haoyu Gu, Mingda Xu, Qi Liu 27d ago

Anchored Cyclic Generation: A Novel Paradigm for Long-Sequence Symbolic Music Generation

Novel autoregressive paradigm for long-sequence symbolic music generation using anchored cyclic generation.

Ax Vishaal Kapoor, Mariam Dundua, Sarthak Ahuja, Neda Kordjazi, Evren Yortucboylu, Vaibhavi Padala, Derek Ho, Jennifer Whitted, Rebecca Steinert 27d ago

DQA: Diagnostic Question Answering for IT Support

Diagnostic RAG system for IT support with explicit diagnostic state tracking across turns to accumulate evidence and resolve hypotheses.

Ax Khoi T. N. Nguyen, Nghia D. Nguyen, Hui Yu Koh, Patrick W. H. Kwong, Karen Sui Geok Chua, Ananda Sidarta, Baosheng Yu 27d ago

OGA-AID: Clinician-in-the-loop AI Report Drafting Assistant for Multimodal Observational Gait Analysis in Post-Stroke Rehabilitation

Multi-agent LLM system for clinician-in-the-loop gait analysis report drafting, coordinating specialized agents for multimodal data synthesis.

Ax Jae Joong Lee 27d ago