Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization
Scaling laws for Mixture-of-Experts architecture design balancing global interactions and MoE-specific variables in LLMs.
Scaling laws for Mixture-of-Experts architecture design balancing global interactions and MoE-specific variables in LLMs.
Joint optimization of RL policies and LLM prompts for improving reasoning with verifiable rewards on hard samples.
Parameter-efficient vector-quantized UNet variant for weather precipitation nowcasting with reduced computational requirements.
Energy optimization technique for edge device inference using fine-grained DVFS scaling aware of network sparsity.
Analysis of temporal difference error interpretations in deep reinforcement learning and impact on critic loss formulation.
Systematic empirical study on scaling RL for autonomous LLM agents with long-horizon tool orchestration using TravelPlanner benchmark.
Gradient-boosted decision trees method for power flow analysis in distribution systems using sequential path-based learning.
Framework for explaining trajectories in multi-objective reinforcement learning agents handling conflicting objectives.
Learning-based approach to parameterize GELU activation functions for converting smooth networks to piecewise-linear ReLU equivalents.
Non-parametric conformal regression method using binning optimization with CRPS metric for conditional distribution estimation.
Method to reduce overthinking in Large Reasoning Models by detecting and stopping redundant reasoning steps, lowering latency and compute costs.
AdditiveLLM2 domain-adapted multimodal LLM based on Gemma 3 for additive manufacturing using instruction tuning on domain corpus.
Framework and benchmark for detecting inconsistencies between research papers and their implementations in bioinformatics software.
Analysis of how overparametrization and priors interact in Bayesian neural network posteriors and their effects on inference.
Study on why topic-matched contrast baselines fail in directional refusal abliteration for removing safety behaviors from LLMs.
MIHT algorithm for time series classification using multi-instance learning on variable-length and high-dimensional temporal data.
Analysis of reinforcement learning with verifiable rewards for LLM reasoning, focusing on direction rather than magnitude of weight updates.
Computationally efficient classifier with frequentist uncertainty bounds suitable for safety-critical applications.
Trainable activation function family (dynActivation) providing adaptive nonlinearity for vision and language modeling tasks.
RAMPAGE algorithm addressing discretization bias in extragradient methods for variational inequalities with variance reduction.
Multimodal survival analysis combining clinical text, tabular data, and genomics using locally deployable lightweight LLMs for privacy-constrained settings.
Causal investigation of whether LLMs use internal confidence estimates to regulate behavior through abstention paradigm experiments.
Theoretical framework reducing calibration of forecasts to online learning techniques with results for general proper losses.
Study on incorporating domain knowledge into LLM-based code generation for quantum software development while maintaining maintainability.
Chimera serving system for multi-agent LLM workflows optimizing latency and performance on heterogeneous model deployments.
SPA baseline method using prompt engineering to generate synthetic data for knowledge injection into LLMs in specialized domains.
Benchmarking methodology for probabilistic time series forecasting using noise titration to test model robustness to non-stationarity.
Decoding strategy analysis for diffusion language models showing confidence-based decoding is provably efficient for parallel token generation.
Reinforcement learning approach decoupling exploration and policy optimization using uncertainty-guided tree search for autonomous agent exploration.
DoRA scaling improvements using factored norms and fused kernels to reduce memory overhead in weight-decomposed low-rank adaptation for LLMs.
Off-topic: addresses passive torque control for robotic manipulators using viability theory for collision avoidance.
Introduces visual exclusivity attacks for multimodal models where harm emerges through visual content reasoning, exploited via agentic planning for red teaming.
Proposes fast-slow thinking reward models combining scalar and generative reward models for efficient RLHF alignment with improved accuracy over scalar-only approaches.
Presents AgenticGEO, a self-evolving agentic system for generative engine optimization that dynamically adapts content strategies to improve visibility in LLM-based search.
Proposes multi-agent debate with memory masking for LLM reasoning, where multiple agents debate solutions across rounds with selective memory management.
Introduces locally coherent parallel decoding for diffusion language models to capture token dependencies while achieving sub-linear generation latency.
Investigates predicting expected reward scores from reward models to route prompts to suitable LLMs before generation, enabling intelligent model selection.
Studies KV cache reuse strategies in chunk-level caching for retrieval-augmented generation, analyzing accuracy improvements when precomputing caches for retrieved text chunks.
Proposes latent lookahead training for transformers to enable multiple token exploration per step, addressing limitations of standard next-token prediction in autoregressive language models.
Compares latency and energy costs of edge vs cloud inference for AI tutoring using quantized Phi-3 models, analyzing learning-per-watt efficiency.
Convex relaxations for rank-constrained quadratic optimization without spectral structure requirements using lifted semidefinite programming.
Preordered Multi-Objective MDP for autonomous driving balancing safety, efficiency, and comfort via distributional reinforcement learning with safety constraints.
Driver risk fusion method for screening safety-critical scenarios in autonomous driving from naturalistic driving data without manual annotation.
Machine learning framework for severe weather prediction over 2-6 hour windows using convection-allowing model output post-processing.
Abjad-Kids dataset for Arabic speech recognition in primary education covering alphabets, numbers, and colors for low-resource language learning.
SciNav is an agent framework for autonomous science agents to perform scientific coding tasks with objective evaluation on executable benchmarks.
DESRO framework uses LLMs to infer intermediate scientific reasoning steps from experimental outcomes for molecule optimization without explicit step annotations.
Semisupervised geometric unmixing method using simplex-volume penalties and archetypal analysis for spectral data analysis.
Multi-agent reinforcement learning framework for coordinating UAV networks with joint communication, sensing, and energy constraints for waste detection.
Heterogeneous multi-agent reinforcement learning with learned inter-agent communication for autonomous cyber defense against network attacks.