Analysis of Optimality of Large Language Models on Planning Problems
Analyzes frontier LLMs on classic AI planning problems, examining whether models reason optimally or rely on heuristic strategies in Blocksworld domain.
Analyzes frontier LLMs on classic AI planning problems, examining whether models reason optimally or rely on heuristic strategies in Blocksworld domain.
Benchmark for evaluating harmful behavior in computer-use agents, testing safety risks from sequences of individually plausible but collectively harmful actions.
Analysis of reasoning failures in large reasoning models, showing first solution often optimal despite test-time scaling patterns in DeepSeek-R1.
Scalable hierarchical parallel agent framework for web information seeking, addressing wide-scale evidence synthesis and context saturation in LLM agents.
Benchmark evaluating multimodal LLM agents with tool integration capabilities including visual expansion and web search through agentic reasoning.
AI system automatically formalizes 500+ page graduate-level algebraic combinatorics textbook to Lean, achieving 130K lines of formal code.
Reinforcement learning approach to improve visual reasoning in chart question answering using vision language models with policy optimization.
Framework for agentic AI emphasizing control, memory, and verifiable action under partial observability, inspired by squirrel ecology comparisons.
Evaluates linguistic graph representations combined with pretrained Transformers for language modeling, comparing semantic and syntactic formalisms.
Bayesian and neural models analyzing Chinese learners' English preposition comprehension, using pretrained language models for linguistic analysis.
Research on language modeling with predicted semantic structure, establishing empirical lower bounds for performance improvements using binary vector representations.
Reinforcement learning approach using process rewards to provide intermediate feedback for multi-step mathematical reasoning in LLMs.
Study of LLM-generated text compression using domain-adapted LoRA and arithmetic coding, characterizing lossless and lossy compression frontiers.
Framework for scaling GUI agents using synthetic environmental dynamics and self-supervised learning from ground-truth interaction feedback.
Benchmark for evaluating LLMs and embeddings on drug discovery tasks including hypothesis generation and candidate prioritization.
Offline preference-based RL method improving query efficiency by addressing exploration and preference ranking within existing datasets.
Neural architecture performing discrete symbolic constraint reasoning while maintaining differentiability for planning and feasibility checking.
Study using contrastive prompt tuning to optimize LLMs for generating energy-efficient code supporting Green Software Development.
Framework for zero-shot transfer between RL agents using interpretable discrete concepts validated through causal intervention.
Dynamic UAV deployment system for vehicular networks using Q-learning with action masking to enhance reliability in urban environments.
Framework using LLMs as judges to evaluate safety of model responses for users with psychosis, addressing clinical validation gaps in mental health.
ML pipeline using ensemble learning to detect internet routing instability from traceroute latency data without control plane information.
Conformer-based model for decoding speech information from high-density EEG using dual-pathway architecture with ERP and broadband features.
Analysis of agent communication protocols for LLM systems organized into communication, syntactic, and semantic layers with systematic evaluation of 18 protocols.
Survey of AI and ML applications in 6G networks covering high data rates, low latency, and emerging applications like autonomous systems.
Synthetic data pipeline for reasoning in long-document visual understanding that generates thinking traces for improved LLM performance on enterprise documents.
Framework addressing underspecified natural language requests for cloud infrastructure code generation using LLMs with multi-level disambiguation.
Audio-visual navigation system for autonomous agents to localize and navigate toward vocalizing targets in 3D environments.
Deep learning framework for predicting wireless channel characteristics in vehicular 6G communications using visual feature fusion.
Privacy-preserving group emotion recognition model using variational encoder-multi-decoder architecture without per-person feature extraction.
Approach using LLMs to detect and repair errors in MPI code for high-performance computing and distributed training frameworks.
LumiVideo agentic system mimicking professional video colorists' workflows with interpretable iterative control for automated color grading.
Research on deep generative models (diffusion, flow matching) for high-dimensional distributions on constrained submanifolds in physics data.
Self-Directed Task Identification framework enabling models to autonomously identify target variables in zero-shot learning without pre-training.
Framework using Mixture-of-Gaussians trajectory prediction for diverse multi-agent play generation in team sports.
Survey of deep learning approaches for diabetic retinopathy detection addressing dataset limitations and geographic diversity issues.
Research investigating whether frontier reasoning models are necessary for mathematical proof verification versus smaller LLM judges.
NLP research on skeleton-based coherence modeling for narrative generation and detection of incoherent story structures.
Empirical evaluation of LLMs as behavioral simulators for predicting intervention effects across 11 climate-psychology interventions using 59,508 participants.
Research studying geometric structure of layer-wise updates in deep language models across Transformer and state-space architectures.
VERTIGO system for cinematic camera trajectory generation with visual preference optimization for realistic shot composition.
Hierarchical Interpretable Label-Free Concept Bottleneck Model enabling interpretability at multiple abstraction levels unlike single-level existing CBMs.
Diffusion-based foundation model generates synthetic satellite imagery for wildfire detection without task-specific retraining.
Transformer-based framework using Vision Transformer for predicting fluid flows in energy systems, applied to gas injection phenomena.
Zero-shot malware family classification using weighted hierarchical ensembles of LLMs, avoiding need for labeled datasets and handcrafted features.
Image Prompt Packaging method to reduce token costs in multimodal LLMs by embedding structured text into images, benchmarked across frontier models.
Vision-language model for lumbar spinal stenosis diagnosis from MRI with adaptive loss function for class imbalance handling.
Study of social meaning in LLMs, introducing calibration metrics and pragmatic prompting strategies to improve quantitative approximation of human reasoning.
Unified framework for deriving sparse Bayesian learning algorithms using neural networks and majorizer learning.
System for private long-term memory in personal AI using trusted hardware and oblivious RAM to hide data access patterns from providers.