Ax ShengYun Peng, Eric Smith, Ivan Evtimov, Song Jiang, Pin-Yu Chen, Hongyuan Zhan, Haozhu Wang, Duen Horng Chau, Mahesh Pasupuleti, Jianfeng Chi 11d ago

Large Reasoning Models Learn Better Alignment from Flawed Thinking

RECAP: RL method for safety alignment in large reasoning models, teaching critical evaluation of flawed premises via counter-aligned prefilling.

Ax Zhaoyang Zhang, Shuli Jiang, Yantao Shen, Yuting Zhang, Dhananjay Ram, Shuo Yang, Zhuowen Tu, Wei Xia, Stefano Soatto 11d ago

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Reinforcement-aware knowledge distillation method for distilling RL-trained reasoning LLMs into smaller models while preserving chain-of-thought capability.

Ax Uzay Macar, Li Yang, Atticus Wang, Peter Wallich, Emmanuel Ameisen, Jack Lindsey 11d ago

Mechanisms of Introspective Awareness

Investigates mechanisms of introspective awareness in LLMs, where models detect injected steering vectors with minimal false positives.