Ax Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang 7d ago

DMax: Aggressive Parallel Decoding for dLLMs

DMax enables efficient parallel decoding in diffusion language models through progressive self-refinement.

Ax Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov 7d ago

KV Cache Offloading for Context-Intensive Tasks

KV cache offloading technique to reduce memory and latency overhead for long-context LLM inference.

Ax Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo 7d ago

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Meta-learning approach for brain signal decoding without per-subject training.

Ax Xiangru Jian, Hao Xu, Wei Pang, Xinjian Zhao, Chengyu Tao, Qixin Zhang, Xikun Zhang, Chao Zhang, Guanzhi Deng, Alex Xue, Juan Du, Tianshu Yu, Garth Tarr, Linqi Song, Qiuzhuang Sun, Dacheng Tao 7d ago

FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

Benchmark dataset and evaluation for multimodal LLMs in manufacturing scenarios.

Ax Mohamed Ehab (Faculty of Computer Science, October University for Modern Science & Arts, Giza, Egypt), Ali Hamdi (Faculty of Computer Science, October University for Modern Science & Arts, Giza, Egypt), Khaled Shaban (Department of Computer Science and Engineering, Qatar University, Doha, Qatar) 7d ago

CAMO: A Class-Aware Minority-Optimized Ensemble for Robust Language Model Evaluation on Imbalanced Data

CAMO is an ensemble technique for imbalanced text classification that optimizes minority class performance through hierarchical voting, confidence calibration, and uncertainty estimation.

Ax Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire Jr., Dejan Kosti\'c, Marco Chiesa 7d ago

Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC

Blink is an LLM serving architecture that removes the host CPU from the critical path by delegating orchestration and token control to GPU and SmartNIC, improving inference performance and datacenter resource utilization.