Ax Giulio Valentino Dalla Riva, Matteo Dalla Riva 7d ago

Intensity Dot Product Graphs

Intensity Dot Product Graphs extending random dot product graphs with Poisson point process for latent positions.

Ax Yuanjian Xu, Tianze Sun, Changwei Xu, XinLong Zhao, Jianing Hao, Ran Chen, Yang Liu, Ruijie Xu, Stephen Chen, Guang Zhang 7d ago

Rethinking Data Mixing from the Perspective of Large Language Models

Studies data mixing strategies for LLM training, questioning domain definitions, human-model alignment, and impact of domain weighting on generalization.

Ax Yanling Xiao, Huaibing Xie, Guoliang Zhao, Shihan Dou, Shaolei Wang, Yiting Liu, Nantao Zheng, Cheng Zhang, Pluto Zhou, Zhisong Zhang, Lemao Liu 7d ago

A Decomposition Perspective to Long-context Reasoning for LLMs

Decomposes long-context reasoning in LLMs into atomic skills, automatically identifying and improving fundamental capabilities for complex reasoning.

Ax Gabriel Dubus, Th\'eau d'Audiffret, Claire Auger, Rapha\"el Cornette, Sylvain Haupert, Innocent Kasekendi, Raymond Katumba, Hugo Magaldi, Lise Pernel, Harold Rugonge, J\'er\^ome Sueur, John Justice Tibesigwa, Sabrina Krief 7d ago

DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park

DeepForestSound multi-species acoustic detector for biodiversity monitoring in African tropical forests using semi-supervised learning pipeline.

Ax Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu 7d ago

Small Vision-Language Models are Smart Compressors for Long Video Understanding

Tempo framework compresses long videos for multimodal LLMs by query-aware selection of frames, addressing context limits and lost-in-middle problems.

Ax Tristan Thrush, Sung Min Park, Herman Brunborg, Luke Bailey, Marcel Roed, Neil Band, Christopher Potts, Tatsunori Hashimoto 7d ago

Synthetic Data for any Differentiable Target

RL primitive (Dataset Policy Gradient) optimizing synthetic data generators to produce targeted training examples for fine-tuning LLMs on differentiable metrics.

Ax Runpeng Geng, Chenlong Yin, Yanting Wang, Ying Chen, Jinyuan Jia 7d ago

PIArena: A Platform for Prompt Injection Evaluation

Unified evaluation platform for prompt injection attacks and defenses, addressing benchmark gaps in comparing robustness across diverse tasks.

Ax Insaf Ashrapov 7d ago

Tabular GANs for uneven distribution

Survey of tabular data generation comparing GANs, diffusion models, and LLMs across sample quality, privacy, and controllability dimensions.

Ax Young-Jin Park, Cesar Almecija, Apoorva Sharma, Navid Azizan 7d ago

Tractable Uncertainty-Aware Meta-Learning

Meta-learning approach with uncertainty quantification for limited-data task learning, addressing out-of-distribution scenarios in safety-critical settings.