Equivariant Evidential Deep Learning for Interatomic Potentials
Research on uncertainty quantification for ML interatomic potentials using evidential deep learning.
Research on uncertainty quantification for ML interatomic potentials using evidential deep learning.
arXiv: Geometric analysis of transformer optimization dynamics revealing low-dimensional manifolds in grokking.
Research paper studying loss-landscape geometry as early-warning signals for grokking in neural networks.
CeRA: parameter-efficient fine-tuning method overcoming LoRA's linear capacity ceiling via non-linear gating and dropout for rank adaptation.
SafeSci: comprehensive benchmark and framework for evaluating LLM safety in scientific domains with multi-domain risk coverage and objective evaluation.
Framework for EEG-to-text decoding addressing semantic bias and signal neglect in neural signal interpretation. Published on arXiv.
Stock market prediction using Node Transformer architecture with BERT sentiment analysis to capture market patterns and dependencies.
DiFlowDubber: discrete flow matching framework for video dubbing with TTS, lip synchronization, and expressive prosody. Published on arXiv.
Qualitative study of 167,000+ AI agents on multiple platforms learning from each other and developing emergent behaviors without researcher intervention.
arXiv: RAG-enhanced diffusion models using adaptive guidance to resolve conflicts between retrieved noisy context and parametric model knowledge.
Uses unsupervised machine learning (UMAP, HDBSCAN) to analyze drift rate patterns in fast radio burst data, discovering bimodal structure in emission regions.
Studies robustness of medical vision-language models under real clinical workflows using chain-of-distribution attacks and token-space repair techniques.
ArXiv research on parameterized GELU activation for controlled ReLU approximation in deep networks.
ArXiv paper on coarse-to-fine visual processing for efficient document parsing with vision-language models.
ArXiv study on behavioral consistency of LLM agents in SWE-bench comparing multiple models.
ArXiv research analyzing prompt injection attack success stages across five frontier LLM agents.
ArXiv paper on token-level entropy regulation for reinforcement learning in large reasoning models.
ArXiv research on spectral edge thesis controlling phase transitions in neural network training dynamics.
APEX-EM non-parametric framework for LLM agents to accumulate and reuse procedural plans without weight modification.
World model planning for structured origami generation satisfying geometric constraints and kinematic rules via long-horizon reasoning.
Terminal agents executing enterprise tasks via CLI are simpler and more cost-effective than tool-augmented or web agents.
Transfer learning methods for nonparametric Bayesian networks under scarce data with constraint-based and score-based algorithms.
Body model ablation replaces SMPL with Momentum Human Rig for 3D Gaussian avatar generation with simpler architecture.
ProdCodeBench evaluates AI coding agents using production-derived tasks reflecting real developer-agent sessions and workflows.
Visual attention inertia in MLLMs causes cognitive hallucinations; proposes mitigation for compositional understanding.
Convolutional surrogate model for accelerating 3D discrete fracture-matrix simulations in groundwater flow modeling.
LiME achieves expert specialization in multimodal MoE-PEFT via lightweight modulation instead of separate adapters per expert.
SIEVE enables sample-efficient parametric learning from natural language instructions and feedback without high-quality traces.
Model scheduling for masked diffusion language models uses smaller models at early denoising steps for faster generation.
Process reward models improve LLM mathematical reasoning by providing step-level feedback on intermediate errors, not just final outcomes.
Fairness-aware GNN training using contrastive learning and counterfactual augmentation to mitigate biases from graph structure.
LLM-based compression using domain-adapted LoRA for lossless and lossy text compression achieving 2x improvements.
Systematic characterization of WebGPU dispatch overhead for LLM inference across GPU vendors, backends, and browsers at batch size 1.
UI-Oceanus framework scales GUI agents via synthetic environmental dynamics and self-supervised learning instead of costly human demonstrations.
Benchmark evaluating LLM and embedding performance for drug discovery tasks, assessing advantages over traditional methods.
Frequency-aware Transformer for carbon footprint forecasting in power grids using periodic patterns and exogenous variables.
Contextual RL improves agent generalization by exposing agents to environment characteristics for better zero-shot transfer beyond training distribution.
OPRIDE method for offline preference-based RL reducing human feedback queries through efficient in-dataset exploration strategies.
Differentiable Symbolic Planning architecture combining neural networks with discrete symbolic reasoning for constraint satisfaction problems.
Framework for modeling and controlling ML model reliability under temporal distribution shift during deployment with continuous monitoring.
Contrastive prompt tuning method to optimize LLMs for generating energy-efficient code aligned with Green Software Development goals.
PRISM framework for zero-shot policy transfer in RL using interpretable concept clustering with causal validation across different algorithms.
Entropy-based analysis of combining Chain-of-Thought with RL for text-to-image generation, showing exploration-optimization tradeoffs.
Live benchmark dataset for forecasting startup success using Y Combinator batches with three-month evaluation cycles.
Dynamical systems analysis of vanishing gradient and overfitting in multi-layer perceptrons using minimal models.
Self-Directed Task Identification framework enabling models to autonomously identify target variables in zero-shot settings without pretraining.
Physics-informed deep generative models for offline RL in spaceflight to mitigate sim-to-real gap with limited real-world training data.
Open-source benchmarking of Matrix Profile methods for time-series anomaly detection on univariate and multivariate datasets.
Study comparing frontier vs smaller LLMs for mathematical proof verification, evaluating whether expensive models are necessary for proof checking.
Analysis of layer-to-layer representation changes in language models, decomposing updates into tokenwise and residual components.