Adaptive serverless resource management framework using slot-survival prediction and event-driven architecture to optimize cold start latency and utilization.
OntoTKGE model for temporal knowledge graph extrapolation leveraging ontological knowledge to handle sparse historical interactions and enable behavioral pattern inheritance.
GMRL-BD algorithm using bias-diffusion and multi-agent RL to detect untrustworthy topic boundaries of LLMs, identifying domains where model answers cannot be reliably trusted.
Auditable Agents framework establishing accountability, auditability, and auditing definitions for LLM agents with external effects, addressing post-deployment answer-ability.
SCMAPR stage-wise multi-agent refinement framework for complex scenario text-to-video generation that refines and self-corrects ambiguous prompts through agent collaboration.
Thinking Diffusion method adding reasoning penalization and guidance to diffusion multimodal LLMs combining Chain-of-Thought reasoning with parallel generation capabilities.
OmniDiagram unified framework for code generation across diverse diagram types and languages using visual interrogation reward for alignment with visual specifications.
UniCreative approach using reference-free reinforcement learning to balance long-form coherence and short-form expressiveness in LLM-based creative writing generation.
Market-Bench comprehensive benchmark evaluating LLM capabilities in economically-relevant tasks via configurable multi-agent supply chain model with LLM retailer agents.
ActivityEditor dual-LLM-agent framework for zero-shot cross-regional human trajectory generation, synthesizing physically valid mobility patterns without region-specific historical data.
Analysis of 12,007 rank-invariant pseudo-Boolean landscapes introducing stronger notion of rank landscape equivalence under translation and rotation symmetries.
Echo memory framework for multimodal LLM agents enabling transfer of reusable knowledge across Minecraft tasks by decomposing experience into five interpretable dimensions.
SignalClaw framework using LLMs as evolutionary skill generators to synthesize interpretable traffic signal control strategies balancing effectiveness and explainability.
Introduces Tree Decision Diagrams generalizing OBDD for Boolean function representation with improved succinctness and tractable operations like model counting and conditioning.
Neurosymbolic approach combining LLMs with Logic Tensor Networks for auditable offer validation in regulated procurement, ensuring factually correct and legally verifiable decisions.
COSMO-Agent tool-augmented RL framework teaching LLMs to bridge CAD-CAE gap by translating simulation feedback into valid geometric edits for iterative industrial design optimization.
ResearchEVO framework for automated scientific discovery using LLMs to conduct undirected experimentation and generate explanations, instantiating discover-then-explain paradigm computationally.
Research on LLM-as-a-Judge showing both humans and LLMs exhibit bias toward human-authored content labels over identical AI-generated content via counterfactual design and eye-tracking.
Philosophical critique of behavioral evaluation paradigms for AI systems and proposal for cognitive assessment methods.
PECKER algorithm for efficient machine unlearning in diffusion models with directed gradient updates.
CuraLight framework combining RL and LLMs for traffic signal control with debate-guided data curation.
LudoBench benchmark evaluating LLM strategic reasoning in Ludo board game with 480 handcrafted scenarios.
Quality-aware mixture of experts for multimodal sentiment analysis robust to noise and modality missingness.
Unlearn-and-Reinvent pipeline testing whether LLMs can rediscover foundational algorithms after unlearning removal.
Study on cultural evolution showing minimal social learning can transmit higher-level representations without inference.
Hierarchical RL framework (STEP-HRL) for LLM agents using step-level transitions to reduce computational cost and history length.
Vision-language model critic for automated iterative refinement of frontend code generation with visual feedback loops.
Open-source framework for autonomous LLM agents conducting deep learning experiments with hypothesis formation, training, and iterative refinement.
Diagnostic framework determining when LLMs are necessary for contextual multi-armed bandits with text and numerical context.
JTON format, JSON superset with Zen Grid encoding for token-efficient structured data processing in LLMs.
Joint knowledge base completion and QA using combined large and small language models for KB-related tasks.
KV cache compression technique for multimodal LLM inference, reducing memory overhead and latency with hybrid compression strategy.
Architecture for value-driven LLM agents addressing behavioral rigidity through context-value-action design.
Foundation model enabling single GPT-based agent to perform across diverse multi-agent reinforcement learning tasks and environments.
Research agent framework for generating trustworthy reports with confidence estimation and calibration mechanisms.
Multi-objective preference alignment for LLMs using Pareto-lenient consensus to handle diverse human values in model training.
AI agents for retail supply chain operations, automating demand forecasting, procurement, and inventory replenishment in supermarket chains.
Proposes epistemic blinding, an inference-time auditing protocol to separate memorized priors from data-driven inference in LLM-assisted agentic analysis systems.
Investigates instruction-following mechanisms in LLMs through diagnostic probing, finding evidence for compositional skill deployment over universal mechanism.
Proposes ACE-Bench, agent evaluation benchmark with unified grid-based planning tasks, lightweight environments, and configurable difficulty/horizon control.
Introduces Claw-Eval, an end-to-end evaluation suite for autonomous agents addressing trajectory-opaque grading, safety, and interaction modality coverage.
Theoretical analysis of contextuality in quantum information systems as external bookkeeping cost under classical simulation.
Proposes Web Retrieval-Aware Chunking (W-RAC) for efficient RAG document chunking to balance retrieval quality, latency, and cost on web-scale content.
Proposes Task-Driven Alignment (TDA-RC) for improving reasoning chains in LLMs by bridging logical gaps between CoT and multi-round thought paradigms.
Evaluates bidirectional training objectives (MLM, masked attention) to mitigate the reversal curse in autoregressive language models.
Introduces Inclusion-of-Thoughts (IoT), a strategy to reduce LLM instability on multiple-choice questions by filtering irrelevant distractors.
Proposes SUMMIR framework for ranking sports insights extracted by LLMs, addressing hallucinations with 7,900-article dataset across four sports.
Evaluates four open-source PDF-to-Markdown conversion frameworks (Docling, MinerU, Marker, DeepSeek OCR) for RAG document preprocessing impact on QA accuracy.
Studies how to design information retrieval systems for LLM agents versus humans, proposing learning-to-rank methods for agent trajectories.
Analysis of how generative AI enables social engineering fraud and trust manipulation attacks in financial crime scenarios.