Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya
Research on fine-tuning LLMs for epistemic reasoning using Navya-Nyaya logic. Addresses hallucination and brittleness in LLM reasoning capabilities.
Research on fine-tuning LLMs for epistemic reasoning using Navya-Nyaya logic. Addresses hallucination and brittleness in LLM reasoning capabilities.
Theoretical framework exploring order effects in sequential cognitive processes and non-commutativity in metacognition using operational methods.
Proximity measure quantifies similarity of multi-source information object features for entity identification and matching across heterogeneous data sources.
ReVEL hybrid framework uses LLM-guided iterative evolution with structured performance feedback to design effective heuristics for NP-hard problems.
Framework identifies algebraic structures in combinatorial optimization problems, constructs quotient spaces to reduce search space and improve solution quality.
PaperOrchestra multi-agent framework automates AI research paper writing by transforming unstructured materials into submission-ready LaTeX manuscripts.
MMORF multi-agent framework uses language models with specialized agents for multi-objective retrosynthesis planning balancing quality, safety, and cost.
MedGemma 1.5 4B model expands medical capabilities with high-dimensional imaging (CT/MRI/histopathology), anatomical localization, and improved document understanding.
LLM-based sequential clinical diagnosis system models uncertainty-guided evidence acquisition over time using diagnostic trajectory learning.
Kolmogorov-Arnold Fuzzy Cognitive Maps extend neuro-symbolic modeling to handle non-monotonic causal dependencies in complex dynamic systems.
IntentScore is a plan-aware reward model trained on 398K offline GUI interactions to evaluate and score actions for computer-use agents across multiple operating systems.
Multi-agent reinforcement learning replaces channel modeling with spatial intelligence for autonomous control of reconfigurable intelligent surface arrays.
Hierarchical multi-agent reinforcement learning optimizes reconfigurable intelligent surfaces for mmWave networks without channel state information estimation.
Instruction-tuned LLMs parse and mine unstructured HPC system logs from heterogeneous sources to extract patterns and diagnose operational issues.
ClawsBench benchmark evaluates LLM agents on realistic productivity tasks (email, scheduling, documents) in simulated multi-service environments with stateful workflows.
AttriBench: Demographically-balanced benchmark for measuring attribution bias in LLMs when attributing quotes to original authors.
Framework for translating governance norms into enforceable runtime guardrails for agentic AI systems with multi-step execution.
Graph neural network approach for predicting delivery delays in logistics networks using warehouse and transportation data.
Evolutionary theory simulation of how alignment affects populations of AI models over time and belief propagation dynamics.
Reward decomposition approach to disentangle pressure capitulation from evidence blindness in LLM sycophancy behavior.
Theoretical analysis and solutions for value factorization convergence to suboptimal stable points in multi-agent reinforcement learning.
Graph of Skills: Dependency-aware skill retrieval system for managing and scaling thousands of reusable skills in agent systems.
TRACE: Framework for targeted training of LLM agents on capability gaps identified in specific environments and task distributions.
Agentic AI system that profiles user expertise levels to adapt interaction depth using LLaMA-based modular architecture.
RETINA-SAFE benchmark and ECRT framework for detecting hallucination risks in medical LLMs with insufficient or conflicting evidence.
ETR: Training method for efficient chain-of-thought reasoning by optimizing entropy trends rather than global uncertainty reduction.
LatentAudit: White-box monitoring system for RAG hallucination detection using Mahalanobis distance on residual stream activations.
TFRBench: Benchmark for evaluating reasoning capabilities of time-series forecasting systems beyond numerical accuracy metrics.
Using LLMs as judges to evaluate lightweight segmentation models for drone-based power line inspection under distribution shift.
Domain-invariant neurons approach for cross-domain knowledge transfer to boost LLM reasoning in expertise-scarce specialized domains.
Empirical study on using cross-domain demonstrations to improve in-context learning when expert annotations in target domain are scarce.
HYVE framework for LLMs to better process machine data (logs, metrics, traces) through hybrid structured/unstructured representations.
CODESTRUCT: LLM-based code agents using structured AST action spaces instead of text matching for reliable code editing and repository interaction.
Research on multi-agent pathfinding algorithms handling non-unit edge costs and continuous-time actions for real-world robotic/logistics scenarios.
PRISM-MCTS learning approach using reasoning trajectories with metacognitive reflection, inspired by reasoning models like OpenAI o1, for efficient low-resource NLP methods.
Automated framework using locally-deployed LLMs to audit hospital discharge summaries at scale, enforcing transition-of-care documentation requirements for patient safety.
Adaptive serverless resource management framework using slot-survival prediction and event-driven architecture to optimize cold start latency and utilization.
OntoTKGE model for temporal knowledge graph extrapolation leveraging ontological knowledge to handle sparse historical interactions and enable behavioral pattern inheritance.
GMRL-BD algorithm using bias-diffusion and multi-agent RL to detect untrustworthy topic boundaries of LLMs, identifying domains where model answers cannot be reliably trusted.
Auditable Agents framework establishing accountability, auditability, and auditing definitions for LLM agents with external effects, addressing post-deployment answer-ability.
SCMAPR stage-wise multi-agent refinement framework for complex scenario text-to-video generation that refines and self-corrects ambiguous prompts through agent collaboration.
Thinking Diffusion method adding reasoning penalization and guidance to diffusion multimodal LLMs combining Chain-of-Thought reasoning with parallel generation capabilities.
OmniDiagram unified framework for code generation across diverse diagram types and languages using visual interrogation reward for alignment with visual specifications.
UniCreative approach using reference-free reinforcement learning to balance long-form coherence and short-form expressiveness in LLM-based creative writing generation.
Market-Bench comprehensive benchmark evaluating LLM capabilities in economically-relevant tasks via configurable multi-agent supply chain model with LLM retailer agents.
ActivityEditor dual-LLM-agent framework for zero-shot cross-regional human trajectory generation, synthesizing physically valid mobility patterns without region-specific historical data.
Analysis of 12,007 rank-invariant pseudo-Boolean landscapes introducing stronger notion of rank landscape equivalence under translation and rotation symmetries.
Echo memory framework for multimodal LLM agents enabling transfer of reusable knowledge across Minecraft tasks by decomposing experience into five interpretable dimensions.
SignalClaw framework using LLMs as evolutionary skill generators to synthesize interpretable traffic signal control strategies balancing effectiveness and explainability.
Introduces Tree Decision Diagrams generalizing OBDD for Boolean function representation with improved succinctness and tractable operations like model counting and conditioning.