Bridging Training and Merging Through Momentum-Aware Optimization
Unified framework maintaining factorized momentum states across neural network training and model merging to reduce redundant computation.
Unified framework maintaining factorized momentum states across neural network training and model merging to reduce redundant computation.
Physics-informed temporal fusion framework (TPI-AI) for lane-change intention prediction combining LSTM with physics-inspired features for autonomous driving.
Self-Distilled Reasoner: on-policy self-distillation approach for LLM reasoning that addresses distribution mismatch without teacher models.
Reinforcement Unlearning via GRPO: technique for removing sensitive data from LLMs without retraining, compliant with GDPR and EU AI Act.
Sheaf-theoretic and topological perspective on signal diffusion and attention mechanisms in graph neural networks and geometric deep learning.
Analysis of forecast uncertainty in machine learning explainability, addressing instability of LIME and SHAP near decision boundaries.
StealthRL: RL framework using group relative policy optimization to test robustness of AI-text detectors against adversarial paraphrasing attacks.
Theoretical analysis of iterative self-improvement in LLMs using reward-verified outputs with easy-to-hard curriculum learning.
Method for comparing clustering algorithms with overlapping clusters and outliers in unsupervised learning evaluation.
Spectral convolution techniques for geometric deep learning on non-Euclidean data structures like graphs and manifolds.
Interactive browser-based platform teaching federated learning concepts with real-time visualization of heterogeneous data and aggregation algorithms.
mlx-vis: GPU-accelerated dimensionality reduction library for Apple Silicon implementing 8 methods with hardware-accelerated rendering.
FEAT: linear-complexity foundation model for structured data handling heterogeneous datasets with improved attention mechanisms for large-scale applications.
Study of cone effect and modality gap in medical vision-language models, analyzing embedding concentration and cross-modal separation in supervised learning.
AcceRL: distributed asynchronous RL framework for Vision-Language-Action models with integrated trainable world models, eliminating synchronization barriers.
Difficulty-Differentiated Policy Optimization addresses Large Reasoning Models' overthinking and overconfidence by redistributing token allocation based on problem difficulty.
Hypergraph-augmented transformer network for crowd trajectory prediction using spatial-temporal interactions and group dynamics, applicable to robotics and autonomous driving.
AI framework for analyzing bodyworn camera footage to improve police accountability and government transparency at scale.
Algorithm for uniformly sampling high-dimensional convex bodies via stochastic diffusion, achieving improved runtime complexity and Rényi divergence guarantees.
Machine learning approach for learning representations to improve statistical independence testing between high-dimensional random variables with complex distributions.
Graph convolutional network for detecting ataxic gait severity from 2D video, addressing subtle pathological variations in patient movement.
Framework for assessing information security awareness in LLMs, including security knowledge, attitudes, and behavior to improve rejection of unsafe requests.
Research on transfer learning of QAOA parameters for quantum optimization on NISQ processors, focusing on layer-selective approaches for combinatorial problems.
Machine learning approach applying critical transition theory to detect early warning signals in Reddit r/place social experiment.
Genomic language model framework using phylogenetic trees and multispecies alignment for identifying evolutionarily constrained sequences.
Analysis of multi-stage LLM inference pipelines including RAG, KV cache retrieval, routing, and reasoning with optimization strategies.
Pseudo-simulation method for evaluating autonomous vehicles addressing limitations of real-world and closed-loop simulation evaluation.
Framework for multimodal representation learning through simultaneous alignment of diverse data modalities.
Theoretical analysis of policy stochasticity relationship with temperature parameter in mutual information optimal control.
Method leveraging superclasses for representation disentanglement to mitigate spurious correlations and improve group robustness.
Virtual sensing approach monitors IGBT module degradation and temperature using machine learning for reliability assurance.
Evaluation-Aware RL framework considers policy evaluation accuracy during training to reduce variance and bias.
Method for detecting intersectional bias in face recognition embeddings using directional alignment in latent space.
CARES lightweight module selects appropriate image resolution for vision-language models to reduce token overhead and latency.
RobotArena∞ enables scalable robot benchmarking through real-to-sim translation for evaluating diverse robotic agents.
Rep2Text framework recovers original input text from single LLM token representation using trainable adapter for interpretability.
FORWARD dataset of heavy machinery operating in rough terrain with multimodal sensor data from Swedish forestry.
FastMMoE accelerates multimodal LLM inference through dynamic expert activation and token pruning for reduced latency.
Training-free guidance framework for consistent multi-view editing across different scene views using diffusion and flow models.
Unsupervised feature selection method using robust autoencoder and adaptive graph learning for high-dimensional data clustering.
Dementia-R1 applies reinforced pretraining and reasoning to LLMs for longitudinal clinical prognosis from unstructured medical notes.
Benchmark and moderation model for evaluating LLM safety, adversarial robustness, and handling of nuanced harmful content detection.
RayRoPE proposes positional encoding mechanism for multi-view transformers processing posed input images with SE(3)-invariant attention.
Case study integrating PubChem, ChEMBL, and eMolecules using byte-offset indexing for terabyte-scale chemical database. Infrastructure for ML-driven molecular property prediction.
Framework for managing ambiguity in long-horizon workflow agents. Task-agnostic approach for curating and measuring impact of underspecified instructions on agent execution.
Optimization study of decision tree and set cover problems under precedence constraints. Theoretical computer science contribution.
Method using diffusion models to enhance CLIP visual representations by improving both discriminative ability and fine-grained detail perception.
Study of vision language models for spatial grounding in 3D medical imaging. Examines VLM performance across imaging modalities and slice directions.
AC-Foley framework for video-to-audio synthesis using reference audio guidance and acoustic transfer. Addresses semantic granularity and acoustic feature description challenges.
Security research on ClawWorm, self-propagating attacks across multi-agent LLM ecosystems. First study of attack propagation in interconnected agent systems like OpenClaw.