Ax Yifu Yuan, Haiqin Cui, Yaoting Huang, Yibin Chen, Fei Ni, Zibin Dong, Pengyi Li, Yan Zheng, Hongyao Tang, Jianye Hao 2d ago

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Embodied-R1 introduces a 3B VLM using "pointing" as unified intermediate representation to address the seeing-to-doing gap in robotic manipulation across different embodiments.

Ax Vincent Grari, Tim Arni, Thibault Laugel, Sylvain Lamprier, James Zou, Marcin Detyniecki 2d ago

ACT: Agentic Classification Tree

ACT system combines decision trees with LLMs to provide transparent, interpretable, and auditable AI decisions on unstructured data.

Ax Tommy Sha (Kindred), Zhan Cheng (Kindred), Haotian Zhai (Kindred), Xuwei Ding (Kindred), Junnan Li (Kindred), Haixiang Tang (Kindred), Zaoting Sun (Kindred), Yanchuan Tang (Kindred), Yongzhe (Kindred), Yi, Yuan Gao, Anhao Li 2d ago

FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis

Fairness-aware stroke diagnosis framework combining domain-adversarial training with group distributionally robust optimization.

Ax Shaofeng Yin, Jiaxin Ge, Zora Zhiruo Wang, Chenyang Wang, Xiuyu Li, Michael J. Black, Trevor Darrell, Angjoo Kanazawa, Haiwen Feng 2d ago

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

Vision-language agent framework combining inverse graphics with interleaved multimodal reasoning for reconstructing images into editable programs with spatial grounding.

Ax Ellen Xiaoqing Tan, Jack Lanchantin, Shehzaad Dhuliawala, Danwei Li, Thao Nguyen, Jing Xu, Ping Yu, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Xian Li, Olga Golovneva 2d ago

Self-Improving Pretraining: using post-trained models to pretrain better models

Method improving language model pretraining by using post-trained models as data sources to instill desired behaviors like safety and reasoning earlier in training.