Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
Mechanistic analysis of grokking phenomenon in ReLU MLPs on modular arithmetic revealing algorithmic structure.
Mechanistic analysis of grokking phenomenon in ReLU MLPs on modular arithmetic revealing algorithmic structure.
Theoretical study showing diffusion models learn manifold geometry before memorization under manifold hypothesis.
Solution for training instability in physics-informed neural networks on epidemiological models by addressing gradient pathology.
Analysis of neural collapse phenomenon in regression models across multiple layers showing low-rank structure.
Theoretical analysis revealing convex equivalences in ReLU neural networks from sparse signal processing perspective.
Kolmogorov-Arnold networks combining neural learning with symbolic structure for interpretable scientific equation discovery.
Analysis of activation function curvature role in adversarial robustness using parameterized activation family.
Study of vision-language models' robustness to distribution shifts in visual deductive reasoning tasks.
Training method for LLMs on mathematical reasoning combining RL with privileged self-distillation to improve learning on hard problems.
C++ implementation of neural network verification tool supporting bound propagation methods for DNN formal analysis.
Safe reinforcement learning method addressing constraint violations in off-policy exploration through constrained optimistic exploration Q-learning.
ML method for deep-sea microbial analysis with small datasets using knowledge enhancement techniques.
Online multi-robot task assignment and route scheduling in smart factories using wireless communication under partial observability.
DIET: structured pruning method for LLMs using dimension-wise global importance scores that adapt to task-specific requirements.
Uses LLMs to generate portable patient embeddings from clinical time series that transfer across hospitals with minimal retraining.
Analysis of design challenges in iterative generative optimization using LLMs for self-improving agents; identifies hidden choices engineers must make.
Dimension-free zeroth-order estimator for PINNs addressing spatial derivative complexity and memory overhead in high-dimensional PDEs.
Iterative unsupervised framework for feature selection and clustering in high-dimensional data by recovering influential features.
Generative framework using Lagrangian relaxation-guided score-based generation to solve mixed-integer linear programming with diverse solutions.
MoE-Sieve: routing-guided LoRA fine-tuning framework for MoE models that adapts to skewed expert routing patterns for efficiency.
Investigates optimal sensor placement for GNN-based leakage detection in water distribution networks.
Dual guidance approach for RL-based LLM training combining external verification and internal experience to improve reasoning task performance.
Graph representation learning for analog circuit electrical equivalence to support electronic design automation tasks.
Causal inference framework for learning disentangled representations from multiplex graphs by separating shared and layer-specific information.
RLHF-aligned LLMs exhibit response homogenization limiting uncertainty estimation; analyzes alignment tax impact across different tasks and sampling methods.
Gossip-based distributed machine learning algorithms for IoT networks with privacy constraints and limited computation/communication resources.
Graph convolutional networks using reservoir computing to address challenges with complex and dynamic graph data and long-range dependencies.
Bayesian optimization framework for tuning control policies using human preferences and pairwise comparisons instead of quantitative evaluations.
Neural operator learning method combining linear and nonlinear effects for efficient PDE solving without repeated solution computation.
FPGA-based implementation of weightless neural networks using Tsetlin automata for on-chip training and inference with low latency and complexity.
Scalable RL pipeline for improving LLM code generation through synthetic data and curriculum learning, addressing data diversity challenges at scale.
Transformer architecture for multivariate time series forecasting using multi-resolution representations to capture short-term and long-range dependencies.
Study on privacy vulnerabilities in deep learning time series imputation models, demonstrating membership inference attacks in black-box settings.
Nonnegative matrix factorization approach using maximum-volume basis vectors for identifying NMF solutions in highly mixed data.
Methods for assessing adversarial attack vulnerability and augmenting identity recognition models trained on small LiDAR skeleton datasets.
Framework for probabilistic time series forecasting that explicitly models heteroscedasticity and time-varying conditional variances in nonstationary dynamics.
ReGuider representation-level supervision method improves time series forecasting by capturing extreme patterns and salient dynamics in temporal representations.
DeepDTF dual-branch transformer framework predicts cancer drug response from multi-omics data, addressing cross-modal alignment in precision oncology.
Vision-language model approach for image clustering using LLM-generated text features with adaptive semantic centers to improve inter-class discriminability.
Cost-Sensitive Neighborhood Aggregation (CSNA) GNN layer uses per-edge routing to handle heterophilous graph structures differently based on adversarial vs informative regimes.
Framework uses LLMs to automatically design reward functions for cooperative multi-agent reinforcement learning, synthesizing executable reward programs from environment instrumentation.
Multi-agent reinforcement learning approach for decentralized adaptive traffic signal control using learned coordination in partially observable environments.
MolEvolve framework uses LLM guidance with evolutionary search for interpretable molecular optimization, addressing activity cliffs and lack of interpretability.
LSTM functional models learn nonlinear mappings from wave-vessel time series to predict parametric roll episodes and statistical shifts in ship responses.
CUA-Suite dataset provides massive human-annotated continuous video demonstrations for training computer-use agents on desktop automation tasks, addressing data bottleneck.
Transfer learning framework using LSTM and conformal prediction for lithium-ion battery state-of-health forecasting across manufacturing variations.
Theoretical work on uniform laws of large numbers in product spaces extending VC dimension theory under product distribution assumptions.
Sequential-AMPC uses recurrent neural networks to approximate nonlinear model predictive control offline, reducing online computation for embedded hardware control systems.
AI agents using Claude Code autonomously discovered novel adversarial attack algorithms for LLMs that outperform 30+ existing methods in jailbreaking and prompt injection attacks.
Agentic Variation Operators replace fixed mutation/crossover in evolutionary search with autonomous coding agents consulting lineage and domain knowledge.