PriPG-RL: Privileged Planner-Guided Reinforcement Learning for Partially Observable Systems with Anytime-Feasible MPC
RL method for partial observability using privileged planner guidance during training with MPC.
RL method for partial observability using privileged planner guidance during training with MPC.
Framework automating aggregation strategy selection in federated learning across heterogeneous settings.
Pearl framework for multimodal reasoning using predictive embeddings to reduce tool-use overhead in VLMs.
Studies bias redistribution when vision models selectively unlearn demographic groups.
Efficient MoE inference through budget-aware expert activation allocation reducing latency bottlenecks.
Bandit algorithm for contextual decision-making with latent hidden Markov chain dynamics.
Flow-based method for offline multi-agent reinforcement learning using value guidance.
Recommender system using long-term embeddings to balance recency bias and stable user preferences.
Method for assessing model generalization in Vision Transformers via internal representations under distribution shift.
Examines vulnerabilities in machine unlearning methods by analyzing internal representations and concept reintroduction.
DMax enables efficient parallel decoding in diffusion language models through progressive self-refinement.
Architecture combining frozen LLMs as nodes communicating through learned projections in a shared latent space.
Continual learning method using complementary self-supervised embeddings to improve replay buffer sample selection.
Benchmark for egocentric video understanding in AR using long-context reasoning over temporal activities.
Bias-constrained diffusion models for PDE emulation with improved accuracy and training efficiency.
Data selection framework for autonomous driving models balancing multiple evaluation metrics.
Parameter-efficient fine-tuning compression framework reducing communication costs for model adaptation.
Self-supervised pre-training method for time series classification with adaptive input handling.
Data augmentation method using adversarial training for out-of-distribution generalization on graphs.
KV cache offloading technique to reduce memory and latency overhead for long-context LLM inference.
Hybrid post-training combining reinforcement learning and distillation to improve LLM confidence calibration.
Machine learning framework for estimating turbofan engine health from sensor data.
Test-time variational synthesis method for reinforcement learning in domains without verifiable rewards.
Impact of quantization on federated learning accuracy-efficiency trade-offs for aerospace predictive maintenance.
Analysis of how embedding dimensionality affects stability of graph node embeddings.
Mechanistic study of how steering vectors modify LLM behavior for alignment and refusal control.
Meta-learning approach for brain signal decoding without per-subject training.
Multi-agent system for language-agnostic code translation and validation across programming languages.
Framework for adaptive edge AI systems that adjust models during deployment as conditions change.
Newton-Schulz optimization method for orthogonal group synchronization problems.
Memory architecture for efficient LLM inference on edge NPUs with optimized DRAM refresh for KV caches.
Research on memory capacity of Hopfield networks using geometric constraints and phase transitions.
Theoretical analysis of diffusion models using Burgers equation to understand score field evolution.
Benchmark dataset and evaluation for multimodal LLMs in manufacturing scenarios.
Industrial generative reranking system combining causality and utility for video search at scale.
Open-source framework for evaluating physical reservoir computing systems across various substrates.
LLM-based coding agents formalized 85K lines of topology proofs in Isabelle/HOL using ChatGPT and Claude.
Paper on generative reward models for LLM alignment using consistency-aware self-training to improve scalability.
Research on differentially private disease transmission models in contact networks using machine learning.
Privacy-preserving epidemiologic modeling of disease transmission in contact networks using differential privacy.
Semi-autonomous multi-agent system for small molecule drug discovery using multi-modal AI agents and GNNs trained on 800M molecules.
RL-driven compiler using Soft Actor-Critic to jointly optimize ASIC architecture, memory hierarchy, and workload partitioning for on-device AI inference across technology nodes.
Reinforcement learning optimization for TSCH MAC protocol in IoT networks to reduce idle listening and power consumption under dynamic traffic conditions.
ML approach to predict activity cliffs in medicinal chemistry by identifying structural modifications that cause large potency shifts using ChEMBL molecular pair data.
Framework using LLMs as semantic judges to validate and restructure outputs from unsupervised text clustering methods, improving coherence and grounding without labeled data.
CAMO is an ensemble technique for imbalanced text classification that optimizes minority class performance through hierarchical voting, confidence calibration, and uncertainty estimation.
Framework for understanding systematic variation in human-labeled training data, distinguishing between ambiguous items, divergent interpretations, and mistakes rather than treating all disagreement as noise.
Blink is an LLM serving architecture that removes the host CPU from the critical path by delegating orchestration and token control to GPU and SmartNIC, improving inference performance and datacenter resource utilization.
DIVERSED: relaxed speculative decoding for LLM inference using dynamic ensemble verification to improve token acceptance rates.
Parameter-free extragradient algorithms for monotone variational inequalities without manual stepsize selection.