Your LLM Agent Can Leak Your Data: Data Exfiltration via Backdoored Tool Use
Security study of data exfiltration attacks via backdoored tool-use LLM agents, presenting Back-Reveal attack with semantic triggers.
Security study of data exfiltration attacks via backdoored tool-use LLM agents, presenting Back-Reveal attack with semantic triggers.
3D human reconstruction from single images in multi-person scenes with interaction awareness.
Open-source governance-aware agentic platform for security operations, addressing alert fatigue and cross-source event correlation with LLM assistance.
Vision-language reward model framework dynamically decomposing evaluation into interpretable dimensions via gating mechanism.
Multi-agent RAG framework using agents for IoT network intrusion detection with experience library, improving interpretability over ML approaches.
Statistical framework treating LLM evaluation as tensor completion problem, addressing uncertainty quantification in pairwise comparison leaderboards.
Empirical study on fault localization's role in LLM-based automated program repair, evaluating context requirements across 500 SWE-bench instances.
Diagnostic framework combining vision-language models with flow matching and spectral detection for veterinary pneumothorax diagnosis.
Learned elevation models as alternative to LiDAR for radio environment map estimation in wireless networks.
Singing voice conversion system using boundary-aware information bottleneck for fine-grained style control.
Analysis of transformer embedding trajectories exhibiting turbulence-like 5/3 power-law spectral scaling across languages.
FastDiSS improves few-step diffusion language models for sequence-to-sequence generation by addressing self-conditioning approximation gaps.
Context-Agent framework using dynamic discourse trees for hierarchical non-linear dialogue management in LLMs.
Empirical forensic analysis of OpenClaw agentic AI system, examining internal state reconstruction and action logging for digital investigations.
Modular platform combining speech recognition, translation, emotion classification, and sign language rendering using open-source AI services.
Extended reality framework integrating AI services for sign language interpretation and emotion recognition in video conferencing.
Study evaluating style transfer randomization for domain generalization in computer vision synthetic-to-real transfer.
Multimodal model for medical image segmentation guided by clinical text using semantic-topological graph reasoning.
Foundation model for gastrointestinal endoscopy diagnosis using analogical reasoning to improve generalizability and robustness.
Physics-informed neural networks for modeling multiscale fluid dynamics with long-range dependencies in Navier-Stokes equations.
Research characterizing LLM chain-of-thought reasoning as trajectories through representation space, showing step-specific subspaces become more separable with layer depth.
SnapFlow self-distillation method converting 10-step flow-matching VLA models to one-step action generation for real-time robotic manipulation.
Rectified Schrödinger Bridge matching approach for few-step visual navigation in embodied AI agents reducing denoising iterations.
Multimodal LLM-based security assessment method for cyber-physical systems with incomplete architectural documentation and legacy systems.
Framework for converting attention mechanisms between architectures (MLA, SWA) to reduce KV cache memory and bandwidth in LLM inference.
CRFT transformer-based framework using feature flow learning for robust cross-modal image registration in coarse-to-fine approach.
SemLink tool using Siamese Sentence-BERT for semantic-aware automated test oracles detecting hyperlink rot and semantic drift in web applications.
Systematic analysis and benchmark comparing LLM-based automated penetration testing frameworks for autonomous security testing.
Analysis of diffusion-based image compressors' robustness to bit-flip errors compared to classical and learned codecs.
CAKE benchmark with 188 expert-validated questions evaluating LLMs' understanding of cloud-native software architecture across Bloom's taxonomy levels.
Fine-tuning technique using instance-level knowledge scores to reduce LLM hallucinations by aligning pre-training and fine-tuning knowledge.
Demographics-agnostic training method for mitigating bias in wake-up word detection across diverse speaker populations.
EEG-MFTNet deep learning architecture combining multi-scale temporal convolutions and transformers for cross-session motor imagery decoding in BCIs.
Representation-level evaluation metric for learner representations in educational AI systems measuring distinctiveness between students.
Neural network pruning formulated as QUBO optimization problem with principled objective formulations capturing filter interactions.
Swiss-Bench 003 benchmark extending HAAS framework to evaluate LLM reliability and adversarial security in Swiss regulatory and financial contexts.
Method for automated dental superimposition comparing 3D intraoral scans and 2D photos for human identification in forensic contexts.
Technique for improving text-to-image diffusion model interpretability through selective aggregation of cross-attention maps from relevant attention heads.
Neural network method using ReLU networks for generating graphs constrained by specified graph edit distance for cheminformatics and data augmentation.
Benchmark evaluating vision-language models' ability to understand multimodal puns combining visual and textual elements.
Successor representation method for zero-shot unsupervised RL in visual environments using saliency-guided representations and consistency policy learning.
Evaluation method for LLM-based issue resolution agents beyond pass rates, assessing compliance with implicit design constraints and architectural conventions.
Formal security framework for MCP-based AI agents, including threat taxonomy, verification models, and defense mechanisms for tool-connected LLM systems.
Study on surface compliance in LLMs: models agree with knowledge edits but don't internalize changes, affecting reliability of edited parametric memory.
Qualitative case study examining legal professionals' perceptions on AI governance, regulatory gaps, and institutional readiness in Nigeria.
CritBench: evaluation framework for cybersecurity capabilities of LLM agents in operational technology (OT) environments like IEC 61850 digital substations.
Multi-stage validation framework for trustworthy clinical information extraction using LLMs at scale without annotation-intensive reference standards.
Evaluation of LLM personality simulation using psychometric profiles and life story generation, comparing model outputs against real human psychological data.
Framework using graph priors to improve structural coherence in part-based image synthesis by modeling spatial and semantic relationships.
Method using dual self-consistency reinforcement learning to synthesize TikZ graphics code from images, addressing precision challenges in multimodal LLM code generation.