Ax Fanrui Zhang, Qiang Zhang, Sizhuo Zhou, Jianwen Sun, Chuanhao Li, Jiaxin Ai, Yukang Feng, Yujie Zhang, Wenjie Li, Zizhen Li, Yifan Chang, Jiawei Liu, Kaipeng Zhang 4/6/2026

Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection

arXiv paper on code-in-the-loop agentic tool use for image forgery detection, unifying low-level artifacts with semantic knowledge from MLLMs.

Ax Jiayi Yuan, Jonathan N\"other, Natasha Jaques, Goran Radanovi\'c 4/6/2026

AgenticRed: Evolving Agentic Systems for Red-Teaming

arXiv paper on AgenticRed, automated pipeline using in-context learning to evolve red-teaming systems without human-designed workflows.

Ax Bowen Cao, Dongdong Zhang, Yixia Li, Junpeng Liu, Shijue Huang, Chufan Shi, Hongyuan Lu, Yaokang Wu, Guanhua Chen, Wai Lam, Furu Wei 4/6/2026

From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics

arXiv paper analyzing gap between LLM math benchmark performance and real-world application through contextual reasoning benchmark ContextMATH.

Ax Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins, Kristian Lum, Laura Weidinger 4/6/2026

Evaluating Language Models for Harmful Manipulation

arXiv paper introducing framework for evaluating harmful AI manipulation through human-AI interaction studies across policy, finance, and health domains.

Ax Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani 4/6/2026

Therefore I am. I Think

Analysis showing LLM reasoning models encode decisions before generating chain-of-thought explanations via linear probes.

Ax Shin'ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa, Yasutoshi Ida 4/6/2026

Zero-shot Concept Bottleneck Models

Zero-shot concept bottleneck models enabling interpretable predictions without target task training by leveraging zero-shot learning.

Ax Jialin Yang, Dongfu Jiang, Lipeng He, Sherman Siu, Yuxuan Zhang, Disen Liao, Zhuofeng Li, Huaye Zeng, Yiming Jia, Haozhe Wang, Benjamin Schneider, Chi Ruan, Wentao Ma, Zhiheng Lyu, Yifei Wang, Yi Lu, Quy Duc Do, Ziyan Jiang, Ping Nie, Wenhu Chen 4/6/2026

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

StructEval benchmark systematically evaluates LLM capabilities in generating structured outputs across JSON, HTML, React, SVG and other formats.

Ax Rohit Kundu, Vishal Mohanty, Hao Xiong, Shan Jia, Athula Balachandran, Amit K. Roy-Chowdhury 4/6/2026

SAGA: Source Attribution of Generative AI Videos

SAGA framework for source attribution of AI-generated videos. Identifies specific generative model used instead of binary real/fake detection.

Ax Chengqi Dong, Chuhuai Yue, Hang He, Rongge Mao, Fenghe Tang, S Kevin Zhou, Zekun Xu, Xiaohan Wang, Jiajun Chai, Guojun Yin 4/6/2026

Training Multi-Image Vision Agents via End2End Reinforcement Learning

IMAgent: open-source visual agent trained with end-to-end RL for multi-image reasoning tasks, addressing limitations of single-image VLM agents.

Ax Sashuai Zhou, Qiang Zhou, Jijin Hu, Hanqing Yang, Yue Cao, Junpeng Ma, Yinchao Ma, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng, Zhou Zhao 4/6/2026

Unified Thinker: A General Reasoning Modular Core for Image Generation

Open-source image generation model with improved reasoning for logic-intensive instruction following, closing gap to closed-source systems.