Ax Yasushi Nishida 23h ago

AXELRAM: Quantize Once, Never Dequantize

AXELRAM smart SRAM architecture computing attention scores from quantized KV cache without dequantization using orthogonal-transform quantization.

Ax Cristian P\'erez-Corral, Jose I. Mestre, Alberto Fern\'andez-Hern\'andez, Manuel F. Dolz, Jos\'e Duato, Enrique S. Quintana-Ort\'i 23h ago

FedSQ: Optimized Weight Averaging via Fixed Gating

Federated learning optimization via fixed gating for weight averaging under statistical heterogeneity.

Ax Xinyu Wang, Hanwei Wu, Jingwei Song, Shuyuan Zhang, Jiayi Zhang, Fanqi Kong, Tung Sum Thomas Kwok, Xiao-Wen Chang, Yuyu Luo, Chenglin Wu, Bang Liu 23h ago

Co-Evolution of Policy and Internal Reward for Language Agents

Self-Guide framework for LLM agents using co-evolved internal rewards to address sparse reward problem in long-horizon tasks.

Ax Chenxu Yang, Chuanyu Qin, Qingyi Si, Minghui Chen, Naibin Gu, Dingyu Yao, Zheng Lin, Weiping Wang, Jiaqi Wang, Nan Duan 23h ago

Self-Distilled RLVR

On-policy self-distillation training paradigm for LLMs combining dense signals from larger teacher models.

Ax O\u{g}uzhan Ersoy, Nikolay Blagoev, Jona te Lintelo, Stefanos Koffas, Marina Kr\v{c}ek, Stjepan Picek 23h ago

Backdoor Attacks on Decentralised Post-Training

arXiv paper analyzing backdoor attacks on decentralized LLM post-training via pipeline parallelism, examining vulnerabilities from malicious participants.