Selecting Decision-Relevant Concepts in Reinforcement Learning
Algorithms for automatic selection of interpretable concepts for reinforcement learning agents without manual domain expertise.
Algorithms for automatic selection of interpretable concepts for reinforcement learning agents without manual domain expertise.
Dynamic benchmark for evaluating LLM-based fake news detection and fact-checking with time-aware evaluation to address benchmark contamination issues.
Study examining whether LLMs integrate world knowledge with syntactic structure in human-like ways using Turkish relative-clause attachment ambiguities as test cases.
Framework for generating human-object-scene interactions using instruction-conditioned generation with iterative refinement for embodied AI and simulation applications.
Structured prompt framework for improving chain-of-thought reasoning integrity in LLMs for analytical tasks. Addresses reliability issues.
Robustness analysis of TabPFN attention mechanisms under noisy tabular data in few-shot learning. Tabular model evaluation.
Multi-agent planning system for automated video mashup creation with hierarchical orchestration. Cross-modal video editing application.
Analysis of Muon optimizer through spectral Wasserstein flow perspective. Gradient normalization for deep learning training.
Entropy modulation approach for exploration in LLM reasoning with verifiable rewards. Addresses restricted exploration problem.
Framework using LLM agents to adapt federated learning orchestration to client heterogeneity and system dynamics. Improves distributed training.
Framework for personalizing file-system agents using behavioral traces from local systems. Addresses privacy constraints in coworking AI.
Game-theoretic analysis of how AI aggregators affect social learning and knowledge formation. Theoretical study.
Verification methods for deep RL agents in systems/networking to analyze behavior across system states. Addresses safe deployment.
Open-source visual reasoning VLM family matching proprietary models on charts, science, and spatial tasks. Includes training recipes and open weights.
Technique for image restoration using pre-trained diffusion models without fine-tuning. Computer vision application.
Method to optimize inference cost in reasoning LLMs by detecting when to stop generation via confidence dynamics. Improves computational efficiency.
Scalable opponent modeling combining tree search, generative models, and Nash bargaining for game-theoretic RL. Addresses imperfect information games.
Multi-agent RL framework for HIV epidemic control optimization. Public health policy application.
Critique of complexity-theoretic proof claiming machine learning cannot achieve AGI. Discusses theoretical foundations and proof assumptions.
Representation learning approach for multi-institutional health record studies addressing data heterogeneity and privacy. Healthcare application.
Framework combining LLM self-reflection with expert and self-experience for StarCraft II gameplay. Addresses complex environment learning.
Method for LLMs to generate reliable citations without external retrievers by leveraging pretraining knowledge. Improves inference efficiency.
Benchmark for evaluating long-term planning capabilities of LLMs and AI agents. Addresses gap in existing planning benchmarks.
Mathematical framework formalizing similarity relations as structural basis for dynamic systems. Theoretical foundational work.
Survey of autonomous LLM agents for scientific discovery, orchestrating human scientists, code, and physics simulations.
Survey of security threats, defenses, and evaluation methods for agentic AI systems with tool use, planning, and autonomous execution.
PRISM: Training-free framework combining prompt engineering and multi-agent coordination for financial document retrieval with LLMs.
Agent-based framework for automatic validation of mathematical optimization models generated by LLMs from natural language descriptions.
Research on iterative concept refinement for vision classifiers through human-in-the-loop deliberation for subjective visual tasks.
Finch: benchmark for evaluating agents on enterprise finance workflows including data entry, retrieval, calculation, and reporting using Enron dataset.
DDFT protocol for measuring epistemic robustness in LMs under degraded information and adversarial stress beyond static benchmarks.
HAG framework for topic-adaptive agent generation in agent-based modeling balancing macro-level distributions with micro-level rationality.
Mechanistic interpretability study of how Diffusion Transformers generate correct spatial relations between objects in text-to-image generation.
ConvoLearn dataset of 2,134 tutor-student dialogues for fine-tuning LLMs on dialogic tutoring principles in science education.
Study showing LLMs exhibit robustness to emotional framing in rule-bound decision-making despite known brittleness to prompt perturbations.
TSPO: RL framework for multi-turn search-augmented LLM reasoning addressing process and reward homogenization in tool-integrated tasks.
Method for improving Vision Language Model robustness when modalities are missing using scalable diffusion-based feature restoration.
Multi-agent LLM framework for discovering instrumental variables in causal inference through interdisciplinary knowledge synthesis.
Voxtral Realtime: natively streaming ASR model achieving sub-second latency with end-to-end training for audio-text alignment.
SSLogic: agentic meta-synthesis framework where LLM agents iteratively create and refine generator-validator pairs for logic reasoning tasks.
KLong: open-source LLM agent trained for extremely long-horizon tasks using trajectory-splitting SFT and progressive RL with Research-Factory pipeline.
AI Runtime Infrastructure layer that observes and optimizes agent execution for task success, latency, token efficiency, and safety.
DeepFact benchmark and co-evolving agent system for testing factuality of search-augmented LLM-generated research reports.
HECG framework for autonomous agents using LLMs with multi-dimensional error correction and strategy transfer across tasks.
Study showing that deliberation between multiple LLMs can amplify tiny perturbations into divergent decisions, challenging robustness assumptions.
Machine learning framework for automating defect detection in photovoltaic systems using electroluminescence imaging.
Proposes alternative training architecture for geometric and neuromorphic AI using non-standard arithmetic to reduce memory overhead.
Conceptual framework for AI governance addressing regulatory gaps between task-specific systems and foundation models.
Voxtral TTS expressive multilingual text-to-speech model generating natural speech from minimal reference audio.
Metriplector neural architecture primitive based on field theory where input configures abstract physical systems.