Ax Jianhong Pang, Ruoxi Cheng, Ziyi Ye, Xingjun Ma, Zuxuan Wu, Xuanjing Huang, Yu-Gang Jiang 24d ago

Steering the Verifiability of Multimodal AI Hallucinations

Framework for steering verifiability of multimodal LLM hallucinations, distinguishing between obvious and elusive hallucinations to guide mitigation strategies.

Ax Seongwoo Jeong, Seonil Son 24d ago

How Much LLM Does a Self-Revising Agent Actually Need?

Empirical study decomposing LLM-based agent competence to identify which capabilities derive from the language model versus explicit structural design in self-revising agents.

Ax Nguyen Phuc Tran, Brigitte Jaumard, Oscar Delgado, Tristan Glatard, Karthikeyan Premkumar, Kun Ni 24d ago

LLM-Augmented Knowledge Base Construction For Root Cause Analysis

Evaluates LLM-augmented knowledge base construction for root cause analysis in network communications to enable rapid failure diagnosis and outage resolution.

Ax Peijie Yu, Wei Liu, Yifan Yang, Jinjian Li, Zelong Zhang, Xiao Feng, Feng Zhang 24d ago

Benchmarking LLM Tool-Use in the Wild

Benchmark for evaluating LLM tool-use agents on multi-turn, multi-step interactions addressing compositional tasks, implicit intent, and instruction transitions in real user behavior.

Ax Asif Azad, MD Sadik Hossain Shanto, Mohammad Sadat Hossain, Bdour Alwuqaysi, Sabri Boughorbel, Yahya Bokhari, Abdulrhman Aljouie, Ayah Othman Sindi, Ehsan Hoque 24d ago

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

Modular system for phoneme-level Arabic pronunciation assessment combining speech-to-phoneme models with clinical-scale scoring metrics for language learning and therapy.

Ax Thomas Sounack, Raffaele Giancotti, Catherine A. Gao, Lasai Barre\~nada, Hyeonhoon Lee, Hyung-Chul Lee, Leo Anthony Celi, Karel G. M. Moons, Gary S. Collins, Charlotta Lindvall, Tom Pollard 24d ago

Code Sharing In Prediction Model Research: A Scoping Review

Scoping review quantifying code-sharing practices in prediction model research to inform TRIPOD-Code standards development.