Ax Xue Liu, Xin Ma, Yuxin Ma, Yongchang Peng, Duo Wang, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xinyu Chen, Tianci He, Jiani Hou, Liang Hu, Ziyun Huang, Yongzhe Hui, Jianpeng Jiao, Chennan Ju, Yingru Kong, Yiran Li, Mengyun Liu, Luyao Ma, Fei Ni, Yiqing Ni, Yueyan Qiu, Yanle Ren, Zilin Shi, Zaiyuan Wang, Wenjie Yue, Shiyu Zhang, Xinyi Zhang, Kaiwen Zhao, Zhenwei Zhu, Shanshan Wu, Qi Zhao, Wenhao Huang 2d ago

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

XpertBench evaluates LLM performance on expert-level open-ended tasks with rubrics-based assessment.

Ax Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, Vaneet Aggarwal 2d ago

BalancedDPO: Adaptive Multi-Metric Alignment

BalancedDPO method aligns diffusion models with multiple conflicting evaluation metrics for text-to-image generation.

Ax Bowen Feng, Zhiting Mei, Julian Ost, Filippo Ghilotti, Baiang Li, Roger Girgis, Anirudha Majumdar, Felix Heide 2d ago

VERDI: VLM-Embedded Reasoning for Autonomous Driving

VERDI uses Vision-Language Models embedded in autonomous driving stack for reasoning-based trajectory planning under partial observability.

Ax Gordana Ispirova, Michael Sebek, Giulia Menichetti 2d ago

Informatics for Food Processing

Chapter reviewing ML/AI applications in food processing, covering classification frameworks and data science approaches to food informatics.

Ax Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Zixin Rao, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran 2d ago

SoSBench: Benchmarking Safety Alignment on Six Scientific Domains

SoSBench evaluates LLM safety alignment across six scientific domains with sophisticated, knowledge-intensive adversarial prompts.

Ax Patrick Vossler, Fan Xia, Yifan Mai, Adarsh Subbaswamy, Jean Feng 2d ago

LLMs Judging LLMs: A Simplex Perspective

Framework for evaluating LLM judges of LLM outputs, accounting for both sampling and judge quality uncertainty without gold-standard scores.

Ax Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Hongyao Tang, Jianye Hao 2d ago

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Embodied-R1 introduces a 3B VLM using "pointing" as unified intermediate representation to address the seeing-to-doing gap in robotic manipulation across different embodiments.

Ax Vincent Grari, Tim Arni, Thibault Laugel, Sylvain Lamprier, James Zou, Marcin Detyniecki 2d ago

ACT: Agentic Classification Tree

ACT system combines decision trees with LLMs to provide transparent, interpretable, and auditable AI decisions on unstructured data.