Ax Pawe{\l} Liskowski, Benjamin Han, Paritosh Aggarwal, Bowei Chen, Boxin Jiang, Nitish Jindal, Zihan Li, Aaron Lin, Kyle Schmaus, Jay Tayade, Weicheng Zhao, Anupam Datta, Nathan Wiegand, Dimitris Tsirogiannis 4d ago

Cortex AISQL: A Production SQL Engine for Unstructured Data

Snowflake's Cortex AISQL production engine integrates semantic operations into SQL for querying structured and unstructured data.

Ax Tianxin Xie, Wentao Lei, Kai Jiang, Guanjie Huang, Pengfei Zhang, Chunhui Zhang, Fengji Ma, Haoyu He, Han Zhang, Jiangshan He, Jinting Wang, Linghan Fang, Lufei Gao, Orkesh Ablet, Peihua Zhang, Ruolin Hu, Shengyu Li, Weilin Lin, Xiaoyang Feng, Xinyue Yang, Yan Rong, Yanyun Wang, Zihang Shao, Zelin Zhao, Chenxing Li, Shan Yang, Wenfu Wang, Meng Yu, Dong Yu, Li Liu 4d ago

PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

PhyAVBench benchmark evaluates physics-plausibility of audio in text-to-audio-video generation models.

Ax Jingsheng Zheng, Jintian Zhang, Yujie Luo, Yuren Mao, Yunjun Gao, Lun Du, Huajun Chen, Ningyu Zhang 4d ago

Can We Predict Before Executing Machine Learning Agents?

Research paper proposing predictive reasoning to replace costly physical execution in ML agent workflows using internalized execution priors.

Ax Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian, Haihua Yang, Huichao Wang, Jiale Chen, Jianfei Pan, Jieqiong Cao, Jinghao Lin, Kai Wu, Lin Yang, Shengsheng Yao, Tao Chen, Xiaojun Xiao, Xiaozhong Ji, Xu Wang, Yijun He, Zhixiong Yang 4d ago

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE, a medical multimodal foundation model with entity-aware continual pretraining, achieves state-of-the-art on clinical benchmarks.

Ax David Puertolas Merenciano, Ekaterina Vasyagina, Kevin Zhu, Javier Ferrando, Maheep Chaudhary 4d ago

Weight space Detection of Backdoors in LoRA Adapters

Method to detect backdoor attacks in LoRA adapters without test inputs by analyzing weight space, addressing security vulnerabilities in shared model repositories.

Ax Maria Rosaria Briglia, Simone Facchiano, Paolo Cursi, Alessio Sampieri, Emanuele Rodol\`a, Guido Maria D'Amely di Melendugno, Luca Franco, Fabio Galasso, Iacopo Masi 4d ago

Not All Latent Spaces Are Flat: Hyperbolic Concept Control

HyCon: Hyperbolic control mechanism for steering text-to-image models away from unsafe concepts using parallel transport instead of Euclidean adjustments.

Ax Dang Nguyen, Harvey Yiyun Fu, Peter West, Ari Holtzman, Chenhao Tan 4d ago

Moral Mazes in the Era of LLMs

HR Simulator: Game-based evaluation of LLMs navigating complex workplace social norms like giving feedback and rejecting requests appropriately.

Ax Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf, Haifeng Ruan, Ridwan Shariffdeen, Abhik Roychoudhury 4d ago

Code Review Agent Benchmark

Code Review Agent Benchmark: Dataset and evaluation framework for assessing AI agents' ability to review code quality in generated codebases.

Ax Ryszard Tuora, Mateusz Gali\'nski, Micha{\l} Godziszewski, Micha{\l} Karpowicz, Mateusz Czy\.znikiewicz, Adam Kozakiewicz, Tomasz Zi\k{e}tkiewicz 4d ago

UnWeaving the knots of GraphRAG -- turns out VectorRAG is almost enough

Compares GraphRAG with VectorRAG for retrieval-augmented generation, showing simpler vector-based approaches handle chunk relationships effectively.

Ax Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu 4d ago

Vero: An Open RL Recipe for General Visual Reasoning

Vero: Open-source family of vision-language models matching proprietary systems on visual reasoning tasks using reinforcement learning with public recipes and data.