FAIR-Pruner: A Flexible Framework for Automatic Layer-Wise Pruning via Tolerance of Difference
Presents FAIR-Pruner, a search-free framework for adaptive layer-wise structured pruning of neural networks using within-layer ranking signals.
Presents FAIR-Pruner, a search-free framework for adaptive layer-wise structured pruning of neural networks using within-layer ranking signals.
Addresses spurious correlations in multimodal sentiment analysis using causal attention mechanisms across text, audio, and visual modalities.
Proposes multiscale Mamba mixture architecture for long-term spatio-temporal time-series prediction with efficient dependency learning.
Studies data augmentation techniques for generative recommendation systems that predict user interactions from historical behavior sequences.
Presents discrete diffusion models as policies for reinforcement learning with combinatorial action spaces using policy mirror descent for stable training.
Applies Kalman-based neural networks to estimate queue lengths at traffic intersections using vehicle detector and floating car data.
Introduces Effective Model Pruning (EMP) framework using importance scores and effective sample size concept to determine optimal sparsity levels for neural networks.
Proposes single-token number embeddings to improve LLM numeracy and numerical reasoning without requiring external tools or excessive reasoning chains.
Research on post-training pruning for LLMs using local reconstruction to adapt weights after sparsity patterns are selected, reducing inference costs without full retraining.
Proposes SCPO, a sampling-based method for constrained policy optimization in safety-critical RL without gradient access to constraints.
Analyzes relationship between deep neural networks and discrete dynamical systems, comparing PINN solutions to PDEs.
Compares explainability methods for detecting malicious circuits in hardware design using ML techniques.
Derives scaling law lower bounds for meta-learning in quantum control, showing adaptation gains saturate exponentially with gradient steps.
REPVLM: quantifying epistemic uncertainty in vision-language models via Riemannian flow matching on embedding space hyperspheres.
L2D-SLDS: online learning-to-defer framework for non-stationary time series using switching linear-Gaussian state-space models.
DC-LA: Langevin algorithm for sampling with difference-of-convex regularizers using Moreau envelope smoothing and DC structure.
Causal discovery from heteroscedastic dynamical systems using physics-based models and handling cyclic interactions and nonstationarity.
Microcanonical Langevin MCMC for Bayesian deep learning with investigation of mini-batch gradient noise for scaling to large datasets.
Evolutionary generation of multi-agent system architectures using LLMs without code generation, enabling flexible and generalizable MAS design.
rePIRL: learning process reward models for LLM reasoning using inverse RL without requiring expert reward functions.
CompilerKV: risk-adaptive KV cache compression using offline experience compilation to make per-head reliability decisions across prompts.
ICRM: Bayesian method for learning in-context steerable reward models that adapt at test-time for complex preference distributions in LLM alignment.
TaperNorm gradually removes normalization layers in pre-norm transformers by transitioning to learned sample-independent maps.
Federated learning of nonlinear temporal dynamics with graph attention for understanding interdependent subsystems without sharing raw data.
RAT+: training dense attention then inferring with structured dilate patterns reduces FLOPs and KV cache while preserving long-range connectivity.
GeoPT: pre-training physics simulators on geometric data to enable efficient neural surrogates without expensive high-fidelity training data.
Theoretical analysis of Classifier-Free Guidance with dynamic weight adjustment based on score discrepancy analysis for diffusion models.
CALIPER: data-only test for detecting concept drift and estimating post-drift data size needed for stable model retraining.
Study of generalization vs memorization in diffusion models, showing training overfits denoising objective while maintaining sample-level generalization.
Privacy-preserving ML technique using informational compression anonymization to protect sensitive data without performance degradation.
Linear-complexity foundation model for structured data handling scalability challenges in healthcare, finance, and enterprise databases.
Offline reinforcement learning framework combining pretrained policies with learned world models for inference-time optimization.
Mathematical formalism for constructing diffusion processes on lower-dimensional data manifolds using only point cloud samples.
Framework using LLMs for mechanistic reasoning in biology via structured explanations as action graphs for scientific discovery.
On-policy knowledge distillation method identifying which token positions provide most useful learning signals from teacher to student models.
Study of inconsistencies between fairness metrics in evaluating machine learning systems, addressing reliability of demographic fairness assessment.
Post-processing techniques for model merging to satisfy differential privacy requirements without retraining across models with different privacy-utility tradeoffs.
Compute Aligned Training aligns LLM post-training with test-time inference procedures that aggregate or filter outputs from scaled compute.
Flow-based guidance method reformulating reward maximization in generative models as deterministic optimal control for efficient few-step alignment.
Statistical consistency and generalization analysis of contrastive representation learning underlying modern foundation models.
HeadQ: KV-cache quantization method measuring error in model-visible attention coordinates rather than reconstruction error for improved inference.
Theoretical analysis of when ReLU network parameters are identifiable beyond standard symmetries using weighted polyhedral complexes.
GNN-based method for hierarchy-aware knowledge graph embeddings applied to yeast phenotype prediction using semantic losses.
Scalable multi-output Gaussian processes using latent variable transformations for high-dimensional output prediction.
Method for stabilizing off-policy evaluation in online reinforcement learning when incorporating prior data, eliminating manual tuning requirements.
Listwise policy optimization reveals geometric structure in group-based reinforcement learning for LLM post-training with verifiable rewards.
Unified geometric deep learning framework for infinite-dimensional signals on irregular domains using Hilbert bundles and cellular sheaves.
Efficient attention mechanism using Sinkhorn-based optimal transport with block-wise differentiation for long-context processing on TPU hardware.
DECO: sparse mixture-of-experts architecture designed for efficient edge device deployment with dense model performance and reduced memory footprint.
Theoretical analysis of language generation under time-sensitive constraints, studying trade-offs between breadth and timeliness in string generation.