Ax Yan Xie, Tiansheng Wen, Tangda Huang, Bo Chen, Chenyu You, Stefanie Jegelka, Yifei Wang 3/25/2026

Scaling Attention via Feature Sparsity

Sparse Feature Attention method reducing transformer self-attention complexity through feature-level sparsity instead of sequence-level sparsity.

Ax Yuren Cai, Guangyi Wang, Zongqing Li, Li Li, Zhihui Liu, Songzhi Su 3/25/2026

Three Creates All: You Only Sample 3 Steps

Multi-layer Time Embedding Optimization (MTEO) method for accelerating diffusion model inference by optimizing timestep conditioning for few-step sampling.