Large-scale empirical study demonstrating prediction-equivalent models produce substantially different feature attributions across 24 datasets, challenging explainability assumptions.
Study showing LLM stability across repeated runs does not guarantee agreement with statistical ground truth in data-constrained scientific decision-making workflows.
Informationally Compressive Anonymization (ICA) method for privacy-preserving ML that protects sensitive data without the performance degradation of differential privacy or homomorphic encryption.
arXiv: Design principles for XAI interfaces enabling scientists to probe and interpret LLM behavior in reading and research workflows.
arXiv: Counteractive RL framework addressing exponential state space complexity for efficient deep reinforcement learning.
arXiv: Study on electrodermal activity as standalone physiological signal for detecting aerobic exercise in wearables.
arXiv: Python library for unit circle-based computing using complex phasors and unitary gates on torus topology.
arXiv: LLM ensemble approaches for word sense plausibility rating in SemEval-2026 using zero-shot and Chain-of-Thought prompting.
arXiv: Framework for Internet of Physical AI Agents addressing interoperability, security, and sustainability in IoT environments.
arXiv: Privacy-preserving federated learning for Alzheimer's classification using 3D MRI with site-aware techniques.
arXiv: Practical guide to AI-assisted research in mathematics and ML, covering productive tool use and responsible guardrails.
arXiv: Analysis of 10,469 experiments by Claude Opus and Gemini agents across 108k design space cells for ML architecture search.
arXiv: VIBEPASS empirically evaluates LLM self-diagnosis and repair capabilities for autonomous software engineering.
arXiv: Benchmarking causal discovery algorithms on synthetic healthcare data for fairness and utility evaluation.
arXiv: LLM-guided neural architecture search for multimodal time-series classification under data-locality constraints for healthcare.
arXiv: LLM family with dynamic tokenizers eliminating fixed vocabulary constraints, up to 70B parameters, improved domain/language adaptation.
MobileLLM-Flash methodology designs on-device LLMs optimized for latency constraints using hardware-in-the-loop architecture search.
ExpertGen automates expert policy generation in simulation for scalable sim-to-real robotic behavior cloning transfer.
MoLoRA enables per-token adapter routing for multimodal generation and mixed-capability requests in multi-adapter serving.
Lightweight proxy models reduce LLM query costs and latency 100x for AI-augmented SQL operations.
Physics-based preprocessing framework standardizes heterogeneous medical images at scale for improved model generalization.
Multi-task RL with chain-of-thought prompting aligns paralinguistic understanding and generation in speech LLMs.
Three-stage framework for dysarthric speech severity estimation using pseudo-labeling and data augmentation.
xr-adaptive-modality platform studies modality-specific interventions for XR interfaces balancing gaze and hand input.
RadAnnotate uses LLMs with retrieval augmentation and selective automation for efficient radiology report annotation.
FormulaCode benchmark evaluates LLM coding agents on repository-level codebase optimization with realistic multi-objective constraints.
FlatLands dataset and benchmark for bird's-eye view floor completion from single egocentric images.
Probing-based analysis of moral reasoning trajectories in LLMs across six models showing systematic multi-framework deliberation.
Critic-free RL approach for cross-user activity recognition from wearable sensors with temporal feature generation.
Framework adapts vision-language models as online reward generators for robotic reinforcement learning policy refinement.
Survey of resource consumption threats in LLMs including excessive generation, covering efficiency challenges for providers and users.
SEAHateCheck introduces functional test dataset for hate speech detection in low-resource Southeast Asian languages.
Interact3D generates compositional 3D objects from single images while preserving spatial relationships and handling occlusions.
HEAR framework extends vision-language-action models to incorporate real-time sound for robotic manipulation tasks.
RecBundle proposes geometric framework for recommender systems addressing information cocoons through topological representation learning.
Inference-time repair layer for retrieval-grounded QA using answer-conditioned counterevidence retrieval to fix commitment errors.
Parallel in-context learning method reducing latency in vision-language models by decoupling demonstration processing from query encoding.
Diffusion models for joint audio-video generation with two high-quality paired datasets.
Large-scale dataset of 1.55M multi-layer graphic design compositions with hierarchical metadata for layout research.
LLM serving system optimizing agentic workflows by handling cross-call dependencies and redundancy from speculative execution.
Data curation method for calibration in LLM compression via frequency-based selection for pruning and quantization.
Local-first multi-agent architecture for automated repository code review with LangGraph orchestration and structured analysis.
Automated skill distillation and adaptation method for financial reasoning in LLMs without fine-tuning.
Reference-free evaluation framework for pathology vision-language models to detect hallucinations without ground truth.
Benchmark for repository-level code understanding with executable environments, enabling agentic code automation tasks.
Benchmark comparing generative augmentation strategies (GANs, diffusion) for bias correction in imbalanced classification under low-data conditions.
Multimodal LLVM framework for near-field beam prediction in XL-MIMO systems.
Constrained RL method for enforcing hierarchical instruction priority in LLMs via system prompt compliance.
Transformer architecture for 4D point cloud video understanding with temporal scale invariance.
RL method preserving diversity in LLM reasoning via dynamic Jensen-Shannon replay to improve sample efficiency and avoid mode collapse.