Evolutionary Search for Automated Design of Uncertainty Quantification Methods
LLM-powered evolutionary search automatically discovers unsupervised uncertainty quantification methods as Python programs for claim verification.
LLM-powered evolutionary search automatically discovers unsupervised uncertainty quantification methods as Python programs for claim verification.
Fine-tuning approach adapting DeepSeek-OCR-2 for optical chemical structure recognition by formulating task as image-to-text.
Study of brain-LLM alignment during creative divergent thinking tasks, measuring correlation between model performance and human neural activity.
VisionClaw wearable AI agent on Meta Ray-Ban glasses combining egocentric perception with speech-driven task execution via OpenClaw agents.
Sim2Real-AD framework for zero-shot sim-to-real transfer of VLM-guided RL policies from CARLA simulation to physical autonomous vehicles.
Dynamic model analyzing productivity-skill tradeoffs when workers use AI tools, decomposing productivity effects into expertise-dependent and independent channels.
Taxonomy of LLM-based coding agent architectures analyzing scaffolding code patterns including control loops, tool definitions, and context strategies.
Novel salient object detection method based on user needs rather than visual stimuli alone.
LangFIR uses sparse autoencoders on monolingual data to discover language-specific features for steering LLM output language without parallel corpora.
AgenticFlict dataset of merge conflicts from AI coding agent pull requests on GitHub, studying integration challenges in collaborative AI-assisted development.
Video diffusion framework (CRAFT) for generating synthetic bimanual robot manipulation demonstrations with temporal coherence.
Phase-aware suppression method to reduce hallucinations in Vision-Language Models without iterative optimization overhead.
SecPI framework for secure code generation using reasoning LLMs through security reasoning internalization, addressing inference-time vulnerability mitigation.
Actor-critic reinforcement learning approach for multi-robot task allocation with asymmetric arrivals and switching delays.
Neural method for black-box global optimization using iterative refinement from noisy samples, addressing multi-modal function optimization.
LLM-based approach for multi-file repository code generation with executable validation, addressing dependency resolution and integration challenges.
LiveCoder framework for repository-level code generation preserving and reusing task-specific state across multiple LLM attempts.
Generative foundation model for multimodal histopathology that imputes missing modalities from incomplete medical data.
Reinforcement learning approach for environments with delayed feedback using homomorphic state representation.
Method for stable unsupervised self-evolution of multimodal LLMs using continuous softened retracing resampling for feedback quality.
Adaptive Relational Transformer for pedestrian trajectory prediction using temporal-aware relations in robotics.
Microservice system using NLP and deep learning to automate classification of citizen appeals in government services.
Unlocks prompt infilling in masked diffusion language models by applying full-sequence masking during supervised finetuning.
LightThinker++ enables LLMs to dynamically compress intermediate reasoning thoughts into compact representations for efficiency.
Uses LLMs to capture semantic relationships for tail-item sequential recommendation, addressing sparse interaction problem.
RDEx-CMOP is a differential evolution algorithm variant for constrained multiobjective optimization under budget constraints.
Graph learning approach for melanoma detection in dermoscopic images using graph signal processing.
Scientometric analysis of 15 years of augmented human research, examining conference evolution and core themes.
CREBench evaluates LLMs on cryptographic binary reverse engineering, assessing capabilities for vulnerability discovery and malware analysis.
Research identifying limitations in universality of linear truth directions in LLM activation spaces across different settings.
Study measuring human ability to distinguish LLM-generated news from human-written content across six LLM models.
AutoReSpec uses LLMs to generate formal specifications for programs, addressing syntax and logic errors through techniques for complex control flow.
Neuro-symbolic framework for robot manipulation using vision-language models and autonomous domain construction.
Method for discovering repeated attention patterns in large language models at scale for mechanistic interpretability.
Compares vision-language models and CNNs for spectrum management in satellite-terrestrial networks.
CountsDiff extends diffusion models to discrete ordinal data on natural numbers for generation and imputation tasks.
Automated framework for research-level mathematical problem solving combining LLMs with formal verification to reliably resolve conjectures and verify proofs.
Representational collapse in multi-agent LLM committees: measurement of similarity showing agents produce redundant rationales despite different role prompts, with diversity-aware consensus.
InCaRPose: Transformer-based model for relative camera pose estimation in automotive in-cabin monitoring with distorted imaging environments.
k-Maximum Inner Product Attention for efficient graph transformers, reducing quadratic complexity while maintaining expressiveness for large-scale graphs.
Analysis of analogical reasoning in LLMs comparing probed representations with prompted performance, revealing limitations in latent abstraction and generalization.
Field experiment on LLM agent providing iterative personalized behavioral nudges for electricity and hot-water conservation across intervention rounds.
Regime-calibrated demand priors for ride-hailing dispatch using historical segmentation and multi-metric similarity ensemble for fleet repositioning.
Lorentz-Invariant Auction mechanism for bandwidth allocation across heterogeneous-delay networks including LEO satellites and deep-space relays.
I-CALM: prompt-only intervention reducing LLM hallucinations by incentivizing confidence-aware abstention through reward scheme announcements and humility principles.
DC-Ada: reward-only decentralized adaptation for heterogeneous multi-robot teams, adapting frozen policies to mismatched sensor configurations.
Secure-by-design GenAI framework for cloud security and forensics using LLMs with defenses against prompt injection and forensic rigor requirements.
Spatio-temporal sparse autoencoders for interpretable video representation learning, using contrastive objectives and hierarchical grouping to preserve temporal coherence.
Multi-turn decision making framework for goal-oriented conversational systems balancing information acquisition and target commitment under user intent uncertainty.
AdaptFuse: training-free framework for LLMs to perform Bayesian belief updating across multi-turn interactions without fine-tuning on user data.