Ax Xiangyang Zhu, Yuan Tian, Qi Jia, Kaiwei Zhang, Zicheng Zhang, Chunyi Li, Kaiyuan Ji, Dongrui Liu, Zijian Chen, Lu Sun, Renrui Zhang, Yan Teng, Jing Shao, Wei Sun, Xia Hu, Yu Qiao, Guangtao Zhai 4/6/2026

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

SafeSci: comprehensive benchmark and framework for evaluating LLM safety in scientific domains with multi-domain risk coverage and objective evaluation.

Ax Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar 4/6/2026

Terminal Agents Suffice for Enterprise Automation

Terminal agents executing enterprise tasks via CLI are simpler and more cost-effective than tool-augmented or web agents.

Ax Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, Xinyi Wang, Wei Dai, Gang Cao, Yuetang Deng, Zhi Gong, Dezhi Ran, Linyi Li, Wei Yang, Tao Xie 4/6/2026

UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics

UI-Oceanus framework scales GUI agents via synthetic environmental dynamics and self-supervised learning instead of costly human demonstrations.

Ax Timothy Gould, Sidike Paheding 4/6/2026

Self-Directed Task Identification

Self-Directed Task Identification framework enabling models to autonomously identify target variables in zero-shot settings without pretraining.