Study showing evaluation language choice inverts agent-as-judge rankings across five languages on 55 development tasks, revealing backbone sensitivity.
StableTTA training-free test-time adaptation method improving ensemble prediction stability and computational efficiency on ImageNet.
Systematic taxonomy from 10,000 trials identifying which system prompt features trigger LLM agents to exploit security vulnerabilities across models.
Paper Espresso open-source platform automatically discovers, summarizes and analyzes trending arXiv papers using LLMs with structured labeling.
TILA method for analyzing interval change in chest X-rays using vision-language pretraining and temporal comparison.
PassiveQA framework for calibrated question answering that handles incomplete or ambiguous queries through three-action decision awareness.
Contrastive hypothesis retrieval method for medical RAG systems that suppresses clinically distinct but semantically similar negatives.
Cardinality estimation framework for similarity search in high-dimensional spaces using adaptive locality-sensitive hashing.
Legal analysis of how EU AI Act regulates autonomous AI agents across enterprise functions including customer service and clinical decision support.
Deep learning approach for mortality prediction from incomplete multimodal electronic health records using point cloud paradigm.
Detection method for AI-generated videos that preserves high-frequency forgery artifacts at native resolution without preprocessing.
Flow Divergence Sampler method improves flow-matching generative models by addressing velocity field conflicts during sampling.
ROSClaw framework integrates LLMs with embodied agents to bridge semantic understanding and physical execution for multi-agent robot collaboration tasks.
Implementation of LLM-based AI teaching assistant using RAG for a Master's program in motion picture engineering.
6D pose estimation pipeline for industrial bin picking using low-cost RGB-D cameras and depth refinement techniques.
Research on multimodal fact-checking showing visual evidence doesn't universally improve performance in automated fact-checking systems.
Quantization method for LLMs using mixed-to-uniform precision and low-rank decomposition for efficient on-device deployment.
Bilingual corpus of Bangla-English sentences annotated for syntactic structure and tense for low-resource multilingual NLP.
Analysis of what characterizes effective reasoning in multilingual large reasoning models, challenging assumptions that English reasoning patterns transfer.
Study analyzing combined effects of English as second language and typographical errors on LLM performance in multilingual contexts.
Computational audit examining whether LLMs conduct culture-aware reasoning or merely translate between cultures in creative writing tasks.
Reinforcement learning approach for automatically discovering failure modes in vision-language models beyond manual evaluation.
Sampling parallelism method for efficient Bayesian neural networks and uncertainty quantification in risk-sensitive domains.
Scoping review of AI tools for cost reduction in public higher education including generative AI, tutoring systems, and predictive models.
Geometric dynamical systems framework explaining LLM hallucinations as arising from basin structure in latent space with task-dependent separability.
Protocol enabling two AI agents to carry out secret conversations while producing transcripts indistinguishable from honest interactions to passive auditors.
Real-world safety evaluation of OpenClaw personal AI agent analyzing attack surface and vulnerabilities in local system access and service integrations.
Method enabling LLMs to learn from hard reasoning problems through adaptive task reformulation with reinforcement learning from verifiable rewards.
Quantum search approach using Grover's algorithm for combinatorial constraint satisfaction problems demonstrated on magic square generation.
Framework for automatically constructing plug-and-play skill knowledge bases for LLM agents to improve learning efficiency and generalization.
Algorithms for automatic selection of interpretable concepts for reinforcement learning agents without manual domain expertise.
Dynamic benchmark for evaluating LLM-based fake news detection and fact-checking with time-aware evaluation to address benchmark contamination issues.
Study examining whether LLMs integrate world knowledge with syntactic structure in human-like ways using Turkish relative-clause attachment ambiguities as test cases.
Framework for generating human-object-scene interactions using instruction-conditioned generation with iterative refinement for embodied AI and simulation applications.
Structured prompt framework for improving chain-of-thought reasoning integrity in LLMs for analytical tasks. Addresses reliability issues.
Robustness analysis of TabPFN attention mechanisms under noisy tabular data in few-shot learning. Tabular model evaluation.
Multi-agent planning system for automated video mashup creation with hierarchical orchestration. Cross-modal video editing application.
Analysis of Muon optimizer through spectral Wasserstein flow perspective. Gradient normalization for deep learning training.
Entropy modulation approach for exploration in LLM reasoning with verifiable rewards. Addresses restricted exploration problem.
Framework using LLM agents to adapt federated learning orchestration to client heterogeneity and system dynamics. Improves distributed training.
Framework for personalizing file-system agents using behavioral traces from local systems. Addresses privacy constraints in coworking AI.
Game-theoretic analysis of how AI aggregators affect social learning and knowledge formation. Theoretical study.
Verification methods for deep RL agents in systems/networking to analyze behavior across system states. Addresses safe deployment.
Open-source visual reasoning VLM family matching proprietary models on charts, science, and spatial tasks. Includes training recipes and open weights.
Technique for image restoration using pre-trained diffusion models without fine-tuning. Computer vision application.
Method to optimize inference cost in reasoning LLMs by detecting when to stop generation via confidence dynamics. Improves computational efficiency.
Scalable opponent modeling combining tree search, generative models, and Nash bargaining for game-theoretic RL. Addresses imperfect information games.
Multi-agent RL framework for HIV epidemic control optimization. Public health policy application.
Critique of complexity-theoretic proof claiming machine learning cannot achieve AGI. Discusses theoretical foundations and proof assumptions.
Representation learning approach for multi-institutional health record studies addressing data heterogeneity and privacy. Healthcare application.