Toward Reducing Unproductive Container Moves: Predicting Service Requirements and Dwell Times
Data science study using machine learning to predict container service requirements and dwell times at terminals to reduce unproductive moves.
Data science study using machine learning to predict container service requirements and dwell times at terminals to reduce unproductive moves.
Research on distilling hallucination detection signals into transformer representations during training, enabling inference-time detection without external verification.
Framework combining expert medical knowledge with deterministic reasoning to improve reliability and reduce hallucinations in AI-driven symptom analysis systems.
Research paper on uncertainty quantification for reasoning LLMs using hedge-to-verify ratio, addressing limitations of sampling and single-pass proxy methods for proprietary APIs.
Application-layer OS for universal AI agent orchestration supporting 10 LLM providers, 8+ frameworks, 12 multi-agent topologies, and heterogeneous systems.
Hybrid LLM plus lightweight proof checker for reliable math/logic reasoning, verifying arguments and catching logical missteps in generated proofs.
Studies emotion-sensitive decision-making in small language model agents using activation steering and game-theoretic evaluation.
Analysis of cross-domain generalization in reasoning SFT with chain-of-thought, showing generalization is conditional on optimization, data, and base model capability.
Knowledge distillation framework for multi-agent RL enabling resource-aware deployment on edge devices with smaller models.
Lightweight routing engine for Internet of Agents managing agent discovery and request dispatch across devices, edge, and cloud with latency/privacy constraints.
Open evaluation framework ATANT for measuring continuity in AI systems: persistence, context updating, and reconstruction across time.
Framework for steering verifiability of multimodal LLM hallucinations, distinguishing between obvious and elusive hallucinations to guide mitigation strategies.
LLM-driven autonomous multi-agent framework for end-to-end turbomachinery aerodynamic design, coordinating geometry, prediction, optimization, and validation.
Inference-time alignment method for diffusion models using Fleming-Viot resampling to prevent diversity collapse in SMC sampling.
Benchmark for mathematical reasoning beyond competition math, testing advanced theoretical knowledge and deep mathematical reasoning.
Evaluates LLM-generated disinformation risk by comparing LLM judges to human evaluation, addressing limitations of automated assessment.
Using inductive logic programming to approximate neural networks for user preference learning with explainability.
GUI reasoning paradigm called UI-in-the-Loop for improved UI understanding and interaction, enhancing interpretability in screen-to-action tasks.
Multi-agent system using small language models for emotion-aware negotiation with edge deployment focus, combining Bayesian orchestration and emotional dynamics.
Post-processing framework for fairness in ML models via counterfactual averaging, applicable to deployed systems without full model control.
Benchmark for evaluating emotion recognition in AI assistants over time, addresses long-term memory and emotional understanding in interactive systems.
Research on detecting and repairing flaws in planning tasks by making them unsolvable, relevant to automated reasoning but not core AI agent/LLM focus.
EVGeoQA benchmark evaluating LLMs on dynamic multi-objective geo-spatial exploration with user location constraints and compound reasoning beyond static retrieval.
T-STAR framework using tree-structured reinforcement learning with self-rectification and credit grafting to optimize multi-turn LLM agent policies under sparse rewards.
Empirical study decomposing LLM-based agent competence to identify which capabilities derive from the language model versus explicit structural design in self-revising agents.
Neural architecture search approach using Implantable Adaptive Cells injected into pre-trained U-Net skip connections to enhance medical image segmentation performance.
ATR4CH methodology for LLM-based knowledge graph extraction from cultural heritage documents using ontological engineering, validated on authenticity debate case study.
Proposes AI-agent-augmented DNS blocking to prevent student access to LLM services during academic evaluations, addressing assessment integrity concerns.
Evaluates LLM-augmented knowledge base construction for root cause analysis in network communications to enable rapid failure diagnosis and outage resolution.
EviSnap framework for cold-start cross-domain recommendation systems providing evidence-cited, auditable rationales through distilled review factorization without opaque embeddings.
Introduces SearchFireSafety benchmark for statute-centric legal QA addressing hierarchical document retrieval gaps and hallucination in regulatory reasoning with LLMs.
Empirical study of embedding-based retrieval robustness in conversational settings, identifying vulnerability in Qwen3-embedding models to structured dialogue-style noise.
Domain-aware web agent with critic-guided experience retrieval and schema-light facet induction for high-precision search in finance, biomedicine, and pharmaceutical domains.
Online benchmark VenusBench-Mobile for evaluating mobile GUI agents under realistic user-centric conditions with capability diagnostics across diverse app-agnostic tasks.
Personalized goal-oriented chatbot for elderly users initiating conversations from family photos to stimulate cognitive function through structured dialogue framework.
Benchmark for evaluating LLM tool-use agents on multi-turn, multi-step interactions addressing compositional tasks, implicit intent, and instruction transitions in real user behavior.
Interactive educational system visualizing entire reachable state space of 8-puzzle (181,440 states) with coupled abstract graph and concrete puzzle manipulation.
Benchmarking study measuring how different LLMs resist or escalate delusional and conspiratorial thinking in sustained open-ended conversations.
Modular system for phoneme-level Arabic pronunciation assessment combining speech-to-phoneme models with clinical-scale scoring metrics for language learning and therapy.
Investigates correlation between internal entropy dynamics and reasoning correctness in autoregressive LLMs, proposing stepwise informativeness assumption to explain phenomenon.
Automated depression detection system using NLP analysis of audio-recorded primary care encounters with PHQ-9 validation across 1,108 clinical dialogues.
Frames LLM hallucinations as output-boundary misclassification and proposes composite intervention combining instruction-based refusal with structural abstention gate using support deficit scoring.
Consistency-guided decoding approach for three-way logical QA in LLMs, addressing negation inconsistency and epistemic Unknown failures.
Textual time-series corpus of 136 GLP-1RA case reports with LLM-based timeline extraction and risk modeling for longitudinal analysis.
Framework combining LLM analysis with energy-system modeling to forecast electricity footprint of AI data centers through 2030.
CoMAP system using AI for shared visual workspace to support project-based learning design through persistent collaborative context.
Text2DistBench benchmark for evaluating LLM ability to infer distributional knowledge and population-level trends from text collections.
Theoretical framework for cross-lingual transfer and parameter-efficient adaptation in low-resource Turkic languages using LLMs.
Ethical design space exploration for sensor-fused LLM agents in health applications, addressing privacy and bias concerns.
SensorPersona system that extracts user personas from mobile sensor streams using LLM-based agents for improved personalization.