KD-MARL: Resource-Aware Knowledge Distillation in Multi-Agent Reinforcement Learning
Knowledge distillation framework for multi-agent RL enabling resource-aware deployment on edge devices with smaller models.
Knowledge distillation framework for multi-agent RL enabling resource-aware deployment on edge devices with smaller models.
Lightweight routing engine for Internet of Agents managing agent discovery and request dispatch across devices, edge, and cloud with latency/privacy constraints.
Open evaluation framework ATANT for measuring continuity in AI systems: persistence, context updating, and reconstruction across time.
Framework for steering verifiability of multimodal LLM hallucinations, distinguishing between obvious and elusive hallucinations to guide mitigation strategies.
LLM-driven autonomous multi-agent framework for end-to-end turbomachinery aerodynamic design, coordinating geometry, prediction, optimization, and validation.
Inference-time alignment method for diffusion models using Fleming-Viot resampling to prevent diversity collapse in SMC sampling.
Benchmark for mathematical reasoning beyond competition math, testing advanced theoretical knowledge and deep mathematical reasoning.
Evaluates LLM-generated disinformation risk by comparing LLM judges to human evaluation, addressing limitations of automated assessment.
Using inductive logic programming to approximate neural networks for user preference learning with explainability.
GUI reasoning paradigm called UI-in-the-Loop for improved UI understanding and interaction, enhancing interpretability in screen-to-action tasks.
Multi-agent system using small language models for emotion-aware negotiation with edge deployment focus, combining Bayesian orchestration and emotional dynamics.
Post-processing framework for fairness in ML models via counterfactual averaging, applicable to deployed systems without full model control.
Benchmark for evaluating emotion recognition in AI assistants over time, addresses long-term memory and emotional understanding in interactive systems.
Research on detecting and repairing flaws in planning tasks by making them unsolvable, relevant to automated reasoning but not core AI agent/LLM focus.
EVGeoQA benchmark evaluating LLMs on dynamic multi-objective geo-spatial exploration with user location constraints and compound reasoning beyond static retrieval.
T-STAR framework using tree-structured reinforcement learning with self-rectification and credit grafting to optimize multi-turn LLM agent policies under sparse rewards.
Empirical study decomposing LLM-based agent competence to identify which capabilities derive from the language model versus explicit structural design in self-revising agents.
Neural architecture search approach using Implantable Adaptive Cells injected into pre-trained U-Net skip connections to enhance medical image segmentation performance.
ATR4CH methodology for LLM-based knowledge graph extraction from cultural heritage documents using ontological engineering, validated on authenticity debate case study.
Proposes AI-agent-augmented DNS blocking to prevent student access to LLM services during academic evaluations, addressing assessment integrity concerns.
Evaluates LLM-augmented knowledge base construction for root cause analysis in network communications to enable rapid failure diagnosis and outage resolution.
EviSnap framework for cold-start cross-domain recommendation systems providing evidence-cited, auditable rationales through distilled review factorization without opaque embeddings.
Introduces SearchFireSafety benchmark for statute-centric legal QA addressing hierarchical document retrieval gaps and hallucination in regulatory reasoning with LLMs.
Empirical study of embedding-based retrieval robustness in conversational settings, identifying vulnerability in Qwen3-embedding models to structured dialogue-style noise.
Domain-aware web agent with critic-guided experience retrieval and schema-light facet induction for high-precision search in finance, biomedicine, and pharmaceutical domains.
Online benchmark VenusBench-Mobile for evaluating mobile GUI agents under realistic user-centric conditions with capability diagnostics across diverse app-agnostic tasks.
Personalized goal-oriented chatbot for elderly users initiating conversations from family photos to stimulate cognitive function through structured dialogue framework.
Benchmark for evaluating LLM tool-use agents on multi-turn, multi-step interactions addressing compositional tasks, implicit intent, and instruction transitions in real user behavior.
Interactive educational system visualizing entire reachable state space of 8-puzzle (181,440 states) with coupled abstract graph and concrete puzzle manipulation.
Benchmarking study measuring how different LLMs resist or escalate delusional and conspiratorial thinking in sustained open-ended conversations.
Modular system for phoneme-level Arabic pronunciation assessment combining speech-to-phoneme models with clinical-scale scoring metrics for language learning and therapy.
Investigates correlation between internal entropy dynamics and reasoning correctness in autoregressive LLMs, proposing stepwise informativeness assumption to explain phenomenon.
Automated depression detection system using NLP analysis of audio-recorded primary care encounters with PHQ-9 validation across 1,108 clinical dialogues.
Frames LLM hallucinations as output-boundary misclassification and proposes composite intervention combining instruction-based refusal with structural abstention gate using support deficit scoring.
Consistency-guided decoding approach for three-way logical QA in LLMs, addressing negation inconsistency and epistemic Unknown failures.
Textual time-series corpus of 136 GLP-1RA case reports with LLM-based timeline extraction and risk modeling for longitudinal analysis.
Framework combining LLM analysis with energy-system modeling to forecast electricity footprint of AI data centers through 2030.
CoMAP system using AI for shared visual workspace to support project-based learning design through persistent collaborative context.
Text2DistBench benchmark for evaluating LLM ability to infer distributional knowledge and population-level trends from text collections.
Theoretical framework for cross-lingual transfer and parameter-efficient adaptation in low-resource Turkic languages using LLMs.
Ethical design space exploration for sensor-fused LLM agents in health applications, addressing privacy and bias concerns.
SensorPersona system that extracts user personas from mobile sensor streams using LLM-based agents for improved personalization.
Tool-MCoT framework for content safety moderation using small language models augmented with external tools to reduce computational costs.
Analysis of latent cultural themes in training corpora by prompting six leading LLMs to identify recurring patterns about human culture.
Study comparing demonstration selection strategies for in-context learning in LLM-based next point-of-interest prediction.
Comparison of LLMs versus classical ontology methods for extracting breast cancer phenotypes from unstructured clinical notes.
DOVE benchmark for evaluating LLM cultural value alignment using open-ended generation rather than multiple-choice formats.
Study on improving faithfulness and traceability in retrieval-augmented generation through illocutionary explanation planning.
Scoping review quantifying code-sharing practices in prediction model research to inform TRIPOD-Code standards development.
Research investigating implicit intersectional biases in LLMs under persona-driven contexts, introducing Bias Amplification framework to capture dynamic bias shifts.