Smooth Gate Functions for Soft Advantage Policy Optimization
Smooth gate functions for stabilizing GRPO LLM training. Replaces hard clipping with sigmoid-based gating to improve optimization stability in reasoning tasks.
Smooth gate functions for stabilizing GRPO LLM training. Replaces hard clipping with sigmoid-based gating to improve optimization stability in reasoning tasks.
Theoretical analysis of offline reinforcement learning with general function approximation and parametric policies, extending beyond finite action spaces.
Open-source framework for deploying DARPA AIxCC cyber reasoning systems locally. Makes competition CRSs usable outside original infrastructure with improved accessibility.
Evaluation framework for persona-adaptive LLM-powered agents in multi-modal settings, addressing user-aware behavior in customer experience management.
Mathematical analysis of Collatz conjecture dynamics using modular arithmetic and combinatorial methods. Pure mathematics research unrelated to AI/ML.
Mathematical analysis of Collatz conjecture dynamics using modular arithmetic and combinatorial methods with LLM assistance.
Red-teaming Vision-Language-Action models through quality diversity prompt generation to improve robot policy robustness.
AgentDrift: reveals safety risks in LLM agent recommendations when tools are corrupted, hidden by standard metrics.
Framework for improving VideoLLM understanding of camera motion through benchmarking, diagnosis, and explicit geometry injection.
Visual state representations for robotic agents using what-is-where composition for dynamic scene understanding.
FedPBS: federated learning algorithm for personalized training on non-IID data with improved robustness.
Sample-efficient hypergradient estimation for decentralized bi-level reinforcement learning in strategic decision-making.
Proxy models reduce cost and latency of AI queries in SQL databases by 100x through approximation techniques.
Domain-grounded tiered retrieval architecture to reduce LLM hallucinations through retrieval-based verification.
Evolutionarily Stable Stackelberg Equilibrium: game theory solution concept for asymmetric leader-follower games.
Ontology-Guided Diffusion for zero-shot sim2real transfer using neuro-symbolic approach to bridge simulation-reality gap.
Agent Control Protocol: formal specification for admission control governance of autonomous agents with cryptographic identity and policy compliance.
Multi-agent AI system with six specialized agents for automated NIST CSF-aligned cybersecurity risk assessments for small organizations.
Study showing finetuning bypasses LLM safety mechanisms and triggers verbatim recall of copyrighted training data.
Explainable DRL framework for autonomous APT defense using provenance-based graphs and stage-aware modeling.
LLM-based workflow system for multidisciplinary software development coordinating domain experts and developers in automotive.
PRISM photonic accelerator approach reducing KV cache memory bandwidth from O(n) to O(1) for long-context LLM inference.
mSFT algorithm for optimizing heterogeneous multi-task SFT data mixtures by dynamically adjusting compute per sub-dataset.
Weather prediction combining radar observations with foundation model priors for extended nowcasting horizons.
Sim-to-real transfer for humanoid robot control using state-dependent joint torque perturbations instead of domain randomization.
Inference-time scaling with lightweight latent verifiers instead of MLLMs to reduce computational cost in verification.
Method using causal interventions and Vision-Language Models to explain sparse autoencoder features in vision models.
Interpretable evaluation combining symbolic rules with mechanistic interpretability to detect memorization vs genuine generalization.
ITPO framework for optimizing multi-turn human-LLM interactions via RL despite sparse rewards and user stochasticity.
Theoretical analysis of upper entropy computation for credal sets and uncertainty quantification. Pure mathematics focus.
Training method combining synthetic QA and document generation to improve LLM knowledge beyond RAG performance ceiling.
Safe reinforcement learning framework inferring constraints from user preferences with minimal expert demonstrations.
RL agent optimizing operator kernels on Huawei Ascend NPUs. Addresses knowledge gap in alternative hardware ecosystem.
Causal signal reconstruction approach for converting sparse news sentiment into reliable time series for financial/tech analysis.
StateLinFormer model using linear attention for navigation agents with long-term memory. Addresses context window limitations in Transformers.
Research on curriculum learning with dual criteria for temporal data. Proposes improved difficulty-based training scheduling.
PoiCGAN: Poisoning attack method against federated learning systems using feature-label joint perturbation.
APreQEL: Adaptive mixed precision quantization technique for deploying large language models on edge devices with reduced memory and computational requirements.
Research on how LLMs form discrete decision boundaries within continuous semantic spaces through context-driven topological distortion of number representations.
Physics-informed neural networks using residual attention for steady-state electrothermal multiphysics simulation in energy systems.
MetaKube: LLM framework for Kubernetes failure diagnosis with Episodic Pattern Memory Network that learns from operational history to improve diagnostic accuracy over time.
Deep learning model for automated sleep disorder staging from EEG with analysis of generalization gaps in clinical populations.
Theoretical framework analyzing fundamental performance limits when deploying fixed LLMs as optimization modules in agentic systems.
Method for steering code LLMs via activation space manipulation to control programming language and library preferences at inference time.
VPBoost applies variable projection to gradient boosting for improved training of smooth parametric learners like neural networks.
Continuous-time diffusion model for generating synthetic electronic health records with mixed numerical and categorical features.
Framework for generating interpretable explanations of learned behaviors in RL agents with formal behavior definition.
Curriculum learning approach for contextual RL using closed-form updates for self-paced task sequencing.
Lightweight fairness method for LLM-based recommenders using kernelized projection and adapters without fine-tuning.
Domain adaptation framework for foundation models using probabilistic geometric alignment and Bayesian transport.