Reward Sharpness-Aware Fine-Tuning for Diffusion Models
Sharpness-aware fine-tuning approach for diffusion models to reduce reward hacking in reinforcement learning from human feedback.
Sharpness-aware fine-tuning approach for diffusion models to reduce reward hacking in reinforcement learning from human feedback.
Online data selection method for GRPO reinforcement learning that reuses high-signal prompts to improve LLM reasoning training efficiency.
Bayesian sequential design framework combining active learning, multi-armed bandits, and distributed computing for black-box optimization.
Theoretical analysis of batch size effects in stochastic conditional gradient optimization methods.
Pretrained video diffusion model repurposed as differentiable physics simulator for urban wind flow prediction.
Analysis of mechanistic interpretability in VAEs across image and tabular data modalities using causal circuit analysis.
Amortized variational inference method for logistic regression with missing covariate data using VAE-based approach.
Federated learning framework for fine-tuning Mixture-of-Experts LLMs on distributed data with privacy preservation.
Comparative study of LSTM, Transformer, and hybrid architectures for symbolic music generation tasks.
Data-driven weather forecasting using deep learning with reduced computational requirements compared to existing models.
Neural network surrogates for uncertainty quantification in physical systems through interval propagation methods.
Research on world models using reaction-diffusion dynamics as alternative to Transformers for predicting future environment states with better spatial inductive bias.
Analysis of Bregman geometry in transformer representations, showing how stream separation improves steering methods.
Formal framework for defining and analyzing agency in AI systems through continuous representation and mesa-optimization dynamics.
AutoKernel: open-source autonomous agent framework for GPU kernel optimization using iterative search on PyTorch models.
vLLM Semantic Router architecture for LLM inference optimization covering routing, caching, safety, and adaptive mechanisms.
TIDE: post-training system with learned routers for per-token early exit in LLM inference, no retraining required.
PLR: method using Plackett-Luce ranking to efficiently reorder in-context learning examples without exhaustive search.
Algorithms for constrained online convex optimization with memory constraints and predictions.
Fairness improvement method using exponentiated gradient approach for multi-class classification tasks to mitigate bias.
Study of introspective awareness mechanisms in LLMs, investigating whether steering detection reflects genuine circuitry or shallow heuristics.
DSPA: inference-time method using sparse autoencoders for LLM preference alignment without weight updates, enabling mechanistic steering.
Off-policy evaluation methods for ranking systems using offline logged data, addressing bias in inverse propensity score estimators.
Research on how learning systems can converge to incorrect solutions when feedback reliability is unobservable, addressing theoretical issues in optimization.
Continuous relaxation method for partition-constrained subset selection with submodular objectives, improving query complexity over existing local-search approaches.
Develops differential-geometric framework accounting for parameter redundancy in shallow neural networks via quotient geometry to measure intrinsic predictor properties.
Addresses on-device ML inference bottleneck by optimizing feature extraction from user behavior sequences for low-latency mobile app execution.
Open-source Bayesian optimization model for concrete strength prediction and mix design optimization, applying ML to materials science with public datasets.
Derives sharper generalization error bounds for Transformer architectures using offset Rademacher complexity across single and multi-head, multi-layer variants.
Interpretability study probing internal representations of world models (IRIS and DIAMOND) in RL using linear/nonlinear probing and causal interventions.
Information-theoretic analysis of LLM steganography showing Kolmogorov complexity bounds on hidden payload embedding in text while preserving semantic meaning.
SSAM method merges multiple pre-trained multimodal LLMs without additional training by aligning singular subspaces, enabling efficient multi-modality integration.
Lightweight autoencoder-based anomaly detection using federated learning for IoT networks, enabling privacy-preserving security monitoring on resource-constrained devices.
Framework for building general-purpose Graph Foundation Models using Riemannian geometry principles, analogous to large language models for graph-structured data.
mSFT algorithm for multi-task supervised fine-tuning that addresses heterogeneous overfitting by dynamically adjusting compute budget per dataset to balance learning rates.
Bayesian framework for compliance monitoring in rule-governed domains, inferring latent states given known rules rather than learning rules from data.
Multimodal time series anomaly detection model combining numerical and semantic data with alignment and interaction mechanisms for dynamic system monitoring.
GSB-PPO extends proximal policy optimization to trajectory-level generative policies using Schrödinger Bridge perspective, enabling diffusion and flow-based policy optimization.
Session-based graph learning model for predicting next mobile app launches by modeling multi-hop intent patterns and handling sparse/cold-start user profiles.
Federated learning framework for privacy-preserving medical AI training across healthcare institutions while addressing data heterogeneity and deployment challenges.
Model merging technique using Fisher Information to combine long-chain-of-thought and base LLMs, preserving reasoning accuracy while reducing output length without additional training.
Multi-armed bandit approach for selecting among generative models under diversity-aware metrics, addressing efficient model selection in generative AI without relying on classical UCB algorithms.
arXiv paper on uncertainty quantification for distribution-to-distribution flow matching in scientific imaging applications.
FISformer replaces self-attention with fuzzy inference systems in transformers for time series forecasting, addressing uncertainty modeling limitations of dot-product attention.
Post-training virtual cell models with RL using biologically-constrained reward functions for drug discovery simulation.
Precipitation nowcasting approach combining radar imagery with weather foundation model predictions via spectral fusion.
Method for analyzing feature invariances in ML models by sampling from learned equivalence classes without dedicated generators.
Lightweight adapter module enhancing time series foundation models by incorporating correlation information across channels.
Benchmark dataset and baselines for PPG-based clinical prediction tasks from MIMIC-III data.
Analysis of computational complexity in constraint-based causal discovery algorithms using conditional independence tests.