RAGEN-2: Reasoning Collapse in Agentic RL
RAGEN-2 identifies reasoning collapse in RL-trained multi-turn LLM agents where models use input-agnostic templates despite stable entropy metrics.
RAGEN-2 identifies reasoning collapse in RL-trained multi-turn LLM agents where models use input-agnostic templates despite stable entropy metrics.
Neural networks for identifying viscoelastic parameters in multiscale blood flow cardiovascular models using asymptotic-preserving methods.
TalkLoRA: communication-aware mixture of LoRA experts for parameter-efficient LLM fine-tuning addressing routing instability in MoE-augmented approaches.
AgentOpt framework for client-side optimization of LLM-based agents handling composition of local tools, remote APIs and diverse models with reduced costs.
GRPO preference optimization applied to small language models shows diminishing returns on hard samples, revealing capacity boundaries in math reasoning tasks.
Graph Transformer architecture combining GNNs and Transformers for multi-scale molecular property prediction with fragment-aware representation learning.
Bi-level optimization framework (BiSDG) for single domain generalization that decouples task learning from domain modeling using surrogate distributions.
Master Key Hypothesis proposes capabilities correspond to transferable directions in low-dimensional subspace; introduces UNLOCK for training-free cross-model capability transfer.
Develops universal foundation model for graph-structured biomedical data including molecular networks and regulatory circuits.
Addresses hyperparameter tuning challenges in spiking reservoir computing by introducing robustness interval concept for edge-of-chaos operation.
OT-NFM enables one-step generative modeling by learning transport maps directly instead of integrating vector fields, achieving single forward pass generation.
Proposes Neural Computers (NCs) that unify computation, memory, and I/O in learned runtime states, aiming toward fully neural computing systems that replace explicit programs.
Study on latent reasoning limits in LLMs investigating whether models discover multi-step planning strategies without supervision.
Graph embedding-based anomaly detection for microservice architectures identifying under-represented services in load testing.
Data-driven approach reducing electronics production test costs while adapting to changing defect distributions and controlling escape risk.
Conformal Margin Risk Minimization framework for robust classification under label noise without privileged knowledge.
MICA architecture for multivariate time series forecasting addressing Transformer scalability with channel-dependent attention.
Multi-GPU implementation of activation-level interpretability and steering techniques for large language models, extending single-GPU methods to distributed settings.
Novel inference-time scaling method using symbolic execution to select correct code generation solutions from LLM candidates without expensive external verifiers.
DoMinO: unified RL framework for fine-tuning discrete flow matching models viewing sampling as multi-step MDP.
Theoretical analysis of stochastic convex optimization with heavy-tailed gradients under differential privacy constraints.
Study of transformers learning analogical reasoning via copying intermediate representations using meta-learning for compositionality.
VLMShield: defense mechanism for vision-language models against malicious prompt attacks using multimodal feature extraction.
Method for post-training quantization of sparse Mixture-of-Experts models with theoretical generalization guarantees.
Framework for time-series classification using cross density ratio instead of correlation-based statistics.
Systematic study of target context conditioning for molecular property prediction across protein families and data regimes.
TwinLoop: simulation-in-the-loop digital twin framework for online multi-agent reinforcement learning with context shifts.
Physics-driven neural network for estimating wheel polygonal roughness from vibration signals in rail vehicles.
SubFLOT: federated learning method using optimal transport for personalized submodel extraction in heterogeneous settings.
SHAPE framework for LLM reasoning using stage-aware hierarchical advantage estimation to improve process supervision efficiency.
FlowAdam: hybrid optimizer augmenting Adam with geometry-aware soft momentum injection for handling parameter couplings.
GraphWalker: graph-guided in-context learning framework for LLM-based clinical reasoning on electronic health records.
Method for improving classification calibration using generative perspective to regularize cross-entropy loss in deep networks.
Bi-Lipschitz autoencoder with injectivity guarantee for dimensionality reduction while preserving manifold geometry.
Federated learning approach for training time series foundation models using bi-level heterogeneous learning to address gradient conflicts.
Framework for extracting linearized neural network models via knowledge distillation for photonic hardware compatibility.
EmBolic: hyperbolic deep learning architecture for emotion analysis from text using Busemann energy-based attention.
Philosophical examination of machine learning through rhetoric lens, arguing ML is inherently rhetorical rather than objective.
Empirical study of Voronoi tessellations in LLM latent spaces, validating scaling laws of expressibility gaps.
Instance-adaptive variational autoencoder addressing amortization gap in latent variable models through per-instance parametrization.
MoBiE: Binarization framework for efficient inference of Mixture-of-Experts LLMs with post-training quantization techniques.
OmniTabBench: Large-scale benchmark comparing GBDTs, neural networks, and foundation models on tabular data with 100+ datasets.
STQuant framework for adaptive quantization of optimizer states during large multimodal model training to reduce memory costs.
Theoretical analysis of Bellman residual minimization for solving Markov decision processes under linear function approximation.
Neural operator enhancement method for dynamical systems combining Fourier-based operators with diffusion-based high-frequency recovery.
JAX-based differentiable framework for vertex-modeling epithelial tissue mechanics with automatic differentiation and GPU acceleration.
Decentralized multi-agent RL approach for vehicle-to-infrastructure systems using equivariant neural networks.
Efficient scaling technique for diffusion RL post-training using low-precision exploration and higher-precision training.
Neural method for learning search policies in Traveling Salesperson Problem, training models to iteratively improve solutions.
Frailty assessment framework for elderly oncology patients using multimodal wearable data and multi-instance learning.