Modular platform combining speech recognition, translation, emotion classification, and sign language rendering using open-source AI services.
Extended reality framework integrating AI services for sign language interpretation and emotion recognition in video conferencing.
Study evaluating style transfer randomization for domain generalization in computer vision synthetic-to-real transfer.
Multimodal model for medical image segmentation guided by clinical text using semantic-topological graph reasoning.
Foundation model for gastrointestinal endoscopy diagnosis using analogical reasoning to improve generalizability and robustness.
Physics-informed neural networks for modeling multiscale fluid dynamics with long-range dependencies in Navier-Stokes equations.
Research characterizing LLM chain-of-thought reasoning as trajectories through representation space, showing step-specific subspaces become more separable with layer depth.
SnapFlow self-distillation method converting 10-step flow-matching VLA models to one-step action generation for real-time robotic manipulation.
Rectified Schrödinger Bridge matching approach for few-step visual navigation in embodied AI agents reducing denoising iterations.
Multimodal LLM-based security assessment method for cyber-physical systems with incomplete architectural documentation and legacy systems.
Framework for converting attention mechanisms between architectures (MLA, SWA) to reduce KV cache memory and bandwidth in LLM inference.
CRFT transformer-based framework using feature flow learning for robust cross-modal image registration in coarse-to-fine approach.
SemLink tool using Siamese Sentence-BERT for semantic-aware automated test oracles detecting hyperlink rot and semantic drift in web applications.
Systematic analysis and benchmark comparing LLM-based automated penetration testing frameworks for autonomous security testing.
Analysis of diffusion-based image compressors' robustness to bit-flip errors compared to classical and learned codecs.
CAKE benchmark with 188 expert-validated questions evaluating LLMs' understanding of cloud-native software architecture across Bloom's taxonomy levels.
Fine-tuning technique using instance-level knowledge scores to reduce LLM hallucinations by aligning pre-training and fine-tuning knowledge.
Demographics-agnostic training method for mitigating bias in wake-up word detection across diverse speaker populations.
EEG-MFTNet deep learning architecture combining multi-scale temporal convolutions and transformers for cross-session motor imagery decoding in BCIs.
Representation-level evaluation metric for learner representations in educational AI systems measuring distinctiveness between students.
Neural network pruning formulated as QUBO optimization problem with principled objective formulations capturing filter interactions.
Swiss-Bench 003 benchmark extending HAAS framework to evaluate LLM reliability and adversarial security in Swiss regulatory and financial contexts.
Method for automated dental superimposition comparing 3D intraoral scans and 2D photos for human identification in forensic contexts.
Technique for improving text-to-image diffusion model interpretability through selective aggregation of cross-attention maps from relevant attention heads.
Neural network method using ReLU networks for generating graphs constrained by specified graph edit distance for cheminformatics and data augmentation.
Benchmark evaluating vision-language models' ability to understand multimodal puns combining visual and textual elements.
Successor representation method for zero-shot unsupervised RL in visual environments using saliency-guided representations and consistency policy learning.
Evaluation method for LLM-based issue resolution agents beyond pass rates, assessing compliance with implicit design constraints and architectural conventions.
Formal security framework for MCP-based AI agents, including threat taxonomy, verification models, and defense mechanisms for tool-connected LLM systems.
Study on surface compliance in LLMs: models agree with knowledge edits but don't internalize changes, affecting reliability of edited parametric memory.
Qualitative case study examining legal professionals' perceptions on AI governance, regulatory gaps, and institutional readiness in Nigeria.
CritBench: evaluation framework for cybersecurity capabilities of LLM agents in operational technology (OT) environments like IEC 61850 digital substations.
Multi-stage validation framework for trustworthy clinical information extraction using LLMs at scale without annotation-intensive reference standards.
Evaluation of LLM personality simulation using psychometric profiles and life story generation, comparing model outputs against real human psychological data.
Framework using graph priors to improve structural coherence in part-based image synthesis by modeling spatial and semantic relationships.
Method using dual self-consistency reinforcement learning to synthesize TikZ graphics code from images, addressing precision challenges in multimodal LLM code generation.
Framework modeling paraphrasing as affine transformations in transformer embedding spaces to improve interpretability of language model latent spaces.
Research on how social dynamics in multi-agent LLM systems (conformity, expertise perception, dominance) undermine objective decision-making by representative agents.
Research paper LLM4CodeRE uses domain-adapted LLMs for malware decompilation analysis and reverse engineering of obfuscated code.
Research paper on lightweight multimodal VLM adaptation for thermal drone imagery species recognition and habitat analysis via projector alignment.
Research paper on Gym-Anything, a framework converting any software into agent environments for training computer-use agents on complex, long-horizon tasks.
Research paper introducing Polynomial Mixer (PoM), a linear-time token mixing mechanism replacing self-attention in transformers with preserved universality.
Shot-based quantum encoding distributes quantum resources for efficient data loading in quantum neural networks.
Synthetic pipeline generates doctor-patient conversations for training and evaluating long-form audio summarization models.
MIGT taxonomy addresses governance of machine identities and automated agents in enterprise and geopolitical contexts.
Analyzes multi-token prediction's gradient inductive bias for developing coherent world models compared to next-token prediction.
MMEmb-R1 incorporates chain-of-thought reasoning into multimodal embeddings with pair-aware selection and adaptive control mechanisms.
Diffusion model approach for converting low dynamic range video to HDR through scene radiance estimation.
Test-time training method updates LLM fast weights at inference to adapt dynamically to new information streams.
UserCentrix is a hybrid agentic orchestration framework for smart spaces combining memory augmentation with multi-agent coordination.