Ax Xinyu Lu, Kaiqi Zhang, Jinglin Yang, Boxi Cao, Yaojie Lu, Hongyu Lin, Min He, Xianpei Han, Le Sun 3/24/2026

P^2O: Joint Policy and Prompt Optimization

Joint optimization of RL policies and LLM prompts for improving reasoning with verifiable rewards on hard samples.

Ax Yurong Chen, Zhiyi Huang, Michael I. Jordan, Haipeng Luo 3/24/2026

Calibeating Made Simple

Theoretical framework reducing calibration of forecasts to online learning techniques with results for general proper losses.

Ax Zizhe Zhang, Yicong Wang, Zhiquan Zhang, Tianyu Li, Nadia Figueroa 3/24/2026

Viability-Preserving Passive Torque Control

Off-topic: addresses passive torque control for robotic manipulators using viability theory for collision avoidance.

Ax Hongduan Tian, Xiao Feng, Ziyuan Zhao, Xiangyu Zhu, Rolan Yan, Bo Han 3/24/2026

Multi-Agent Debate with Memory Masking

Proposes multi-agent debate with memory masking for LLM reasoning, where multiple agents debate solutions across rounds with selective memory management.

Ax Kenan Hasanaliyev, Silas Alberti, Jenny Hamer, Dheeraj Rajagopal, Kevin Robinson, Jasper Snoek, Victor Veitch, Alexander Nicholas D'Amour 3/24/2026

Expected Reward Prediction, with Applications to Model Routing

Investigates predicting expected reward scores from reward models to route prompts to suitable LLMs before generation, enabling intelligent model selection.

Ax Lorenzo Noci, Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Moin Nabi 3/24/2026

Thinking into the Future: Latent Lookahead Training for Transformers

Proposes latent lookahead training for transformers to enable multiple token exploration per step, addressing limitations of standard next-token prediction in autoregressive language models.