Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking
Region-based re-ranker for multi-modal RAG reducing visual distractors by formulating region selection as decision-making problem.
Region-based re-ranker for multi-modal RAG reducing visual distractors by formulating region selection as decision-making problem.
Multi-agent spec-driven development pipeline with context-grounding hooks to prevent hallucinations and architectural violations in LLM coding agents.
Formal verification of security vulnerabilities in AI-generated code across 7 frontier LLMs and 500 prompts using Z3 SMT solver.
Study on training LLMs to express uncertainty explicitly as control interface for abstention and verification tasks.
Novel autoregressive paradigm for long-sequence symbolic music generation using anchored cyclic generation.
Diagnostic RAG system for IT support with explicit diagnostic state tracking across turns to accumulate evidence and resolve hypotheses.
Multi-agent LLM system for clinician-in-the-loop gait analysis report drafting, coordinating specialized agents for multimodal data synthesis.
Training-free quantization method for 3D reconstruction models using random rotations without per-scene fine-tuning.
Study on AI's role in collective decision-making systems and procedural legitimacy conditions for participants.
Long video understanding via spatio-temporally structured intent-aware RAG, preserving video structure while retrieving query-relevant evidence.
System for adaptive LoRA hyperparameter tuning and orchestration across heterogeneous multi-tenant LLM fine-tuning workloads.
Open-source digital twin simulator integrating natural language with renewable energy microgrid dynamics and dataset.
Security study of data exfiltration attacks via backdoored tool-use LLM agents, presenting Back-Reveal attack with semantic triggers.
3D human reconstruction from single images in multi-person scenes with interaction awareness.
Open-source governance-aware agentic platform for security operations, addressing alert fatigue and cross-source event correlation with LLM assistance.
Vision-language reward model framework dynamically decomposing evaluation into interpretable dimensions via gating mechanism.
Multi-agent RAG framework using agents for IoT network intrusion detection with experience library, improving interpretability over ML approaches.
Statistical framework treating LLM evaluation as tensor completion problem, addressing uncertainty quantification in pairwise comparison leaderboards.
Empirical study on fault localization's role in LLM-based automated program repair, evaluating context requirements across 500 SWE-bench instances.
Diagnostic framework combining vision-language models with flow matching and spectral detection for veterinary pneumothorax diagnosis.
Learned elevation models as alternative to LiDAR for radio environment map estimation in wireless networks.
Singing voice conversion system using boundary-aware information bottleneck for fine-grained style control.
Analysis of transformer embedding trajectories exhibiting turbulence-like 5/3 power-law spectral scaling across languages.
FastDiSS improves few-step diffusion language models for sequence-to-sequence generation by addressing self-conditioning approximation gaps.
Context-Agent framework using dynamic discourse trees for hierarchical non-linear dialogue management in LLMs.
Empirical forensic analysis of OpenClaw agentic AI system, examining internal state reconstruction and action logging for digital investigations.
Modular platform combining speech recognition, translation, emotion classification, and sign language rendering using open-source AI services.
Extended reality framework integrating AI services for sign language interpretation and emotion recognition in video conferencing.
Study evaluating style transfer randomization for domain generalization in computer vision synthetic-to-real transfer.
Multimodal model for medical image segmentation guided by clinical text using semantic-topological graph reasoning.
Foundation model for gastrointestinal endoscopy diagnosis using analogical reasoning to improve generalizability and robustness.
Physics-informed neural networks for modeling multiscale fluid dynamics with long-range dependencies in Navier-Stokes equations.
Research characterizing LLM chain-of-thought reasoning as trajectories through representation space, showing step-specific subspaces become more separable with layer depth.
SnapFlow self-distillation method converting 10-step flow-matching VLA models to one-step action generation for real-time robotic manipulation.
Rectified Schrödinger Bridge matching approach for few-step visual navigation in embodied AI agents reducing denoising iterations.
Multimodal LLM-based security assessment method for cyber-physical systems with incomplete architectural documentation and legacy systems.
Framework for converting attention mechanisms between architectures (MLA, SWA) to reduce KV cache memory and bandwidth in LLM inference.
CRFT transformer-based framework using feature flow learning for robust cross-modal image registration in coarse-to-fine approach.
SemLink tool using Siamese Sentence-BERT for semantic-aware automated test oracles detecting hyperlink rot and semantic drift in web applications.
Systematic analysis and benchmark comparing LLM-based automated penetration testing frameworks for autonomous security testing.
Analysis of diffusion-based image compressors' robustness to bit-flip errors compared to classical and learned codecs.
CAKE benchmark with 188 expert-validated questions evaluating LLMs' understanding of cloud-native software architecture across Bloom's taxonomy levels.
Fine-tuning technique using instance-level knowledge scores to reduce LLM hallucinations by aligning pre-training and fine-tuning knowledge.
Demographics-agnostic training method for mitigating bias in wake-up word detection across diverse speaker populations.
EEG-MFTNet deep learning architecture combining multi-scale temporal convolutions and transformers for cross-session motor imagery decoding in BCIs.
Representation-level evaluation metric for learner representations in educational AI systems measuring distinctiveness between students.
Neural network pruning formulated as QUBO optimization problem with principled objective formulations capturing filter interactions.
Swiss-Bench 003 benchmark extending HAAS framework to evaluate LLM reliability and adversarial security in Swiss regulatory and financial contexts.
Method for automated dental superimposition comparing 3D intraoral scans and 2D photos for human identification in forensic contexts.
Technique for improving text-to-image diffusion model interpretability through selective aggregation of cross-attention maps from relevant attention heads.