Ax Yiquan Wu, Yuhang Liu, Yifei Liu, Ang Li, Siying Zhou, Kun Kuang 20d ago

Luwen Technical Report

Open-source Chinese legal language model built on Baichuan foundation using continued pretraining and instruction tuning.

Ax Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, Wenke Huang 20d ago

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

SkillTrojan: backdoor attack framework targeting skill-based agent systems through malicious skill implementations.

Ax Jiang Zhou, Yunhao Wang, Xing Wu, Tinghao Yu, Feng Zhang 20d ago

WRAP++: Web discoveRy Amplified Pretraining

WRAP++ improves LLM pretraining through synthetic data rephrasing that captures cross-document relationships and associative context.

Ax Bing Wang, Rui Miao, Chen Shen, Shaotian Yan, Kaiyuan Liu, Ximing Li, Xiaosong Yuan, Sinan Fan, Jun Zhang, Jieping Ye 20d ago

On the Step Length Confounding in LLM Reasoning Data Selection

Analysis of step length confounding bias in LLM reasoning dataset selection pipelines used for fine-tuning complex reasoning models on chain-of-thought tasks.

Ax Ioannis Kyprakis, Vasileios Skaramagkas, Georgia Karanasiou, Vasilis Bouratzis, Andri Papakonstantinou, Dimitar Stefanovski, Kalliopi Keramida, Aristofania Simatou, Ketti Mazzocco, Anastasia Constantinidou, Konstantinos Marias, Dimitrios I. Fotiadis, Manolis Tsiknakis 20d ago

Stress Estimation in Elderly Oncology Patients Using Visual Wearable Representations and Multi-Instance Learning

Machine learning approach for stress estimation in elderly cancer patients using multimodal wearable sensor data.