Attention Flows: Tracing LLM Conceptual Engagement via Story Summaries
Evaluates LLM performance on long-form text understanding by comparing human and model-generated novel summaries, assessing conceptual engagement patterns.
Evaluates LLM performance on long-form text understanding by comparing human and model-generated novel summaries, assessing conceptual engagement patterns.
Introduces Graded Color Attribution dataset to evaluate whether Vision-Language Models follow their own introspective reasoning rules compared to human behavior.
Transformer-based NER and entity linking approach for medical symptom recognition in SympTEMIST shared task using RoBERTa and SapBERT.
Proposes Neural Computers (NCs), a model architecture unifying computation, memory, and I/O as learned runtime state, aiming toward completely neural computing systems.
Research on limits of latent reasoning in LLMs, testing whether models can discover multi-step planning strategies without supervision using graph path-finding tasks.
Benchmark and solutions for visual anomaly detection on edge devices with continual learning to adapt to evolving data distributions.
Theoretical proof that no continuous wrapper defense can prevent all prompt injections in LLMs with connected prompt space, characterizing defense failure modes.
Graph embedding-based anomaly detection system identifies under-represented services in microservice architectures using unsupervised learning.
Multi-objective evolutionary merging approach to reduce computational overhead of reasoning models while maintaining accuracy with fewer tokens.
Hybrid ResNet-1D-BiGRU-MHA model for intrusion detection in Industrial IoT systems achieving 98.71% accuracy on EdgeHoTset dataset.
Practical implementation of activation-level interpretability and steering techniques for large language models distributed across multiple GPUs.
Symbolic Equivalence Partitioning uses symbolic execution for inference-time code selection in LLM-based code generation without expensive verifiers.
DoMinO framework unifies reinforcement learning fine-tuning of Discrete Flow Matching models by reformulating sampling as a multi-step MDP.
MedConclusion benchmark dataset of 5.7M PubMed abstracts for evaluating LLMs on biomedical conclusion generation from structured evidence.
Efficient quantization method for Mixture-of-Experts models with theoretical generalization guarantees to reduce inference memory overhead.
Adaptive differential privacy approach for federated medical image segmentation across diverse imaging modalities and clinical sites.
Soft-Quantum Algorithms explores classical simulation of variational quantum circuits for few-qubit problems with large datasets.
SkillSieve is a three-layer detection framework for identifying security vulnerabilities in AI agent skills, addressing both code and natural language prompt injection attacks.
AI-Driven Research for Systems uses LLMs to automate database performance optimization through automated code generation instead of manual design.
Guardian Parser Pack uses LLMs to parse and normalize heterogeneous investigative documents for missing-person cases with varying layouts and data quality.
SciDC method reduces LLM hallucination by incorporating scientific knowledge and rules as decoding constraints to improve reliability.
TwinLoop framework uses simulation-in-the-loop digital twins for online multi-agent reinforcement learning to adapt policies when operating conditions change.
Research finding that 52-88% of chain-of-thought tokens in reasoning models are generated after the answer is already recoverable, revealing a detection-extraction gap in model behavior.
CubeGraph: efficient retrieval-augmented generation system for hybrid queries combining vector similarity search with spatio-temporal filters for RAG workloads.
Logical Robots: declarative multi-agent programming platform using logic programming language Logica for robot behavior specification combining reactive control and planning.
SubFLOT: Federated learning method using optimal transport for efficient submodel extraction, addressing heterogeneity and enabling client-side personalization.
SHAPE: Framework for improving LLM reasoning through process supervision, formalizing reasoning as state-space trajectory with stage-aware advantage estimation.
Parameter-efficient multitask prompt distillation framework for clinical NLP adapting shared metaprompts across diverse medical tasks.
Audience segmentation approach for LLM-based social simulation restoring demographic heterogeneity in behavioral modeling.
Fake news detection framework combining graph analysis with LLM-retrieved evidence for explainable veracity assessment.
Graph-based analysis of semantic change in Persian poetry across centuries using aligned word embeddings.
Chemical vision-language model emphasizing reasoning over perception for understanding molecular reactions and mechanisms.
Hybrid quantum-classical network for remote sensing image segmentation combining multi-scale feature fusion.
Confidence calibration methods for LLM-generated code revisions enabling developers to assess output correctness at instance-level.
Traveling thief problem variant with time window constraints, benchmarks, and heuristics for multi-component optimization.
Multimodal fusion method for sarcasm detection addressing unreliable modalities through uncertainty-aware weighting.
Open-source Chinese legal language model built on Baichuan foundation using continued pretraining and instruction tuning.
CLI-Tool-Bench benchmark for evaluating LLM agents' end-to-end software generation from intent without predefined scaffolds.
Framework enabling multi-LLM collaboration with role-based team structure for solving complex multi-step contextualized tasks.
Pipeline for extracting procedural knowledge and directed graphs from maintenance flowchart images using vision-language models.
Multi-faceted preference alignment approach for conversational query rewriting using feedback from retrieval and generation components.
Benchmark for evaluating LLM-generated repository documentation using question answering, addressing limitations of LLM-as-judge evaluation methods.
FedDAP addresses domain shift in federated learning using prototype learning for privacy-sensitive applications.
Instance-adaptive variational autoencoders reduce amortization gap in latent variable models for deep generative modeling.
MoBiE: binarization framework for efficient inference of mixture-of-experts LLMs using post-training quantization.
SkillTrojan: backdoor attack framework targeting skill-based agent systems through malicious skill implementations.
OmniTabBench: largest tabular data benchmark comparing GBDTs, neural networks, and foundation models at scale.
ESG sentiment analysis dataset and models for Slovene news, addressing corporate performance assessment in emerging markets.
WRAP++ improves LLM pretraining through synthetic data rephrasing that captures cross-document relationships and associative context.
Privacy-preserving LLM inference method enabling text-free processing through alignment and adaptation, reducing privacy risks without computational overhead.