S-Path-RAG framework for multi-hop question answering over knowledge graphs using semantic-aware shortest-path retrieval with differentiable path scoring.
Berta: open-source modular platform for AI-enabled clinical documentation with institutional data governance and workflow integration, deployed at Alberta Health Services.
DepthCharge framework for measuring how deeply LLMs sustain accurate responses in domain-specific topics through adaptive probing across arbitrary domains.
Privacy-preserving synthetic clinical data trains LLM for medical coding automation, improving ICD-10-CM and CPT code assignment from clinical documentation.
Memory Sparse Attention enables end-to-end LLM scaling to 100M tokens for long-term memory tasks, extending effective context beyond 1M token limits.
Position paper proposes mechanism-aware evaluation combining symbolic rules and mechanistic interpretability to distinguish genuine generalization from shortcuts.
Cluster-R1 reframes instruction-following clustering as generative task, enabling reasoning models to autonomously infer corpus structure while respecting user instructions.
MedMT-Bench stress-tests LLMs on long-context memory, interference robustness, and safety in multi-turn medical conversations with realistic clinical scenarios.
Lightweight LLM framework captures and scales physician expertise for clinical decision-making agents using individualized diagnostic methodologies.
Chitrakshara multimodal dataset provides multi-image and Indian language coverage for training Vision-Language Models beyond English-centric datasets.
Qworld framework generates question-specific evaluation criteria for LLMs on open-ended tasks, capturing context-dependent response quality requirements.
ConceptMap tool enables scalable exploratory discovery of human-interpretable concepts in sparse autoencoders trained on LLM activations.
Konkani-Instruct-100k synthetic dataset and benchmarks address LLM performance gaps for low-resource Indian language across multiple scripts via instruction tuning.
Cognitive psychology-inspired study reveals LLMs drop formatting instruction compliance by 2-21% under concurrent task load, identifying prospective memory vulnerabilities.
Fine-tuned lightweight LLM generates hierarchical JSON representations of scientific sentences preserving semantic meaning for structured knowledge extraction.
MDKeyChunker pipeline enables structure-aware chunking of Markdown documents and single-call LLM enrichment with metadata extraction for improved RAG accuracy.
Philosophical comparison of how LLMs gather data versus human scientific knowledge construction and discovery processes.
Mixture of Demonstrations approach improves GraphRAG performance for domain-specific QA by selecting high-quality demonstrations to reduce irrelevant retrieved information.
Computational analysis of upper entropy algorithms for uncertainty quantification in credal set-based probability models.
Native GUI agent framework ReCAP adds CAPTCHA-solving capability to vision-language models using self-corrective training and automated reasoning-action data generation.
Synthetic Mixed Training combines synthetic QAs and documents to improve LLM knowledge acquisition beyond RAG performance in data-constrained domains.
Safe reinforcement learning approach using preference-based constraint inference for learning complex, subjective safety constraints with minimal expert demonstrations.
AI agent optimizes operator performance on Huawei Ascend NPUs by addressing knowledge bottleneck through episodic learning for tiling and kernel programs.
StateLinFormer: linear-attention navigation model with persistent memory for long-term navigation tasks, combining flexibility with efficiency.
Dual-Criterion Curriculum Learning proposes a meta-learning approach using dual criteria for difficulty assessment in temporal data training.
PoiCGAN introduces poisoning attack methods against federated learning systems using feature-label joint perturbation.
APreQEL proposes adaptive mixed precision quantization to reduce memory and computational costs of LLMs for edge device deployment while maintaining performance.
Time-LLM model for predicting wafer-level spatial etch depth distributions in plasma etching process monitoring.
Analysis of deep learning generalization gap in sleep disorder staging with Grad-CAM interpretability and iSLEEPS clinical dataset.
LLMORPH automated testing tool for LLMs using metamorphic testing to detect NLP task failures without human-labeled oracles.
LLMLOOP framework automating iterative refinement of LLM-generated code and test cases through automated feedback loops.
Theory of LLM information susceptibility analyzing fundamental limits of LLM-mediated optimization in agentic systems.
Ukrainian Visual Word Sense Disambiguation benchmark with 10-image choices for evaluating word sense disambiguation in Ukrainian.
Swiss-Bench SBP-002: trilingual benchmark of 395 expert-crafted regulatory compliance tasks across FINMA, Legal-CH, and EFK domains.
Self-supervised learning method for spectral unmixing in fluorescence microscopy using data-driven approach.
Probing study revealing how LLMs internally represent different ethical frameworks with asymmetric transfer patterns across model sizes.
Echoes dataset with 3,577 music tracks for deepfake detection spanning multiple AI music generation systems.
BIRCH-Trees benchmark for estimating individual tree height and species from RGB UAV imagery for forest monitoring.
Training-free out-of-distribution detection using multi-layer prototype fusion approach for robust deep learning deployment.
Privacy-preserving LLM system for disambiguating clinical acronyms in healthcare without transmitting data to external servers.
Machine learning approach for robotic fruit harvesting using active reachability estimation to improve efficiency in unstructured environments.
Measurement methodology for identifying assessment items where LLMs perform differently than humans using theory-grounded evaluation.
Analysis of early-exit decoding in modern LLMs showing reduced efficiency gains due to improved architectures with lower layer redundancy.
Study of filtered vector search algorithms in PostgreSQL for semantic search and GenAI applications, evaluating real-world database performance.
Continuous-time diffusion models for generating synthetic electronic health records with mixed numerical and categorical features.
Self-paced curriculum learning for RL using closed-form Gaussian updates to improve efficiency in high-dimensional contexts.
Intent-Based Networking using AI to translate high-level natural language intents into network policies with automated compliance assurance.
Human-in-the-loop Pareto optimization for motor skill training and rehabilitation, characterizing task difficulty vs. performance trade-offs.
Bayesian latent transport framework for domain-adaptive foundation models addressing distribution mismatch and uncertainty propagation in limited-supervision scenarios.
Cognitive Firewall: hybrid edge-cloud architecture for securing browser-based LLM agents against indirect prompt injection attacks using split-compute security checks.