MSG Score: Automated Video Verification for Reliable Multi-Scene Generation
MSG Score automated metric for verifying coherent multi-scene video generation from text-to-video diffusion models at runtime.
MSG Score automated metric for verifying coherent multi-scene video generation from text-to-video diffusion models at runtime.
Multimodal task for facial forgery detection generating attribution reports with localization and natural language explanations of manipulations.
LongSpec efficient speculative decoding for long-context LLM inference with novel drafting and verification methods for agent applications.
Evaluation framework and benchmark for LLMs in intelligent outpatient referral systems, assessing dynamic healthcare application capabilities.
Empirical study of LLM preferences for programming languages and libraries across eight models, revealing systemic biases in code generation.
Benchmark for evaluating memory-augmented world models via spatial consistency across simulation and planning tasks.
Framework for achieving provable probabilistic safety in embodied AI systems combining models with physical plants for safety-critical deployment.
LongWriter-Zero uses reinforcement learning to overcome LLM generation length limits and quality degradation for ultra-long text generation.
Agent-simulation approach for diagnosing coordination failures in healthcare robot teams before human collaboration, using simulation-based testing.
Method for quantitatively estimating target task performance from unsupervised pretext tasks in semi/self-supervised learning without post-training evaluation.
In-context decision making for AutoML pipeline optimization using LLMs to handle algorithm selection, hyperparameter tuning, and modern ML adaptation techniques.
ShadowNPU system-algorithm co-design for on-device LLM inference on NPUs, addressing quantization sensitivity in attention operators.
Once4All uses LLM-synthesized generators guided by SMT solver structure for fuzzing-based testing of satisfiability modulo theory solvers.
Survey on abstract concept recognition in video understanding, comparing machine capability with human ability to recognize intangible concepts.
PhISM physics-informed deep learning for hyperspectral imaging using unsupervised learning and continuous basis functions for classification and regression.
Draw-In-Mind rebalances multimodal model responsibilities between understanding and generation for improved image editing through unified architecture.
LifeAlign framework enables lifelong alignment of LLMs across sequential tasks while preventing catastrophic forgetting using memory-augmented preference optimization.
AudioRole dataset for multimodal audio role-playing in LLMs, addressing synchronized alignment of semantic content and vocal characteristics for persona simulation.
Stealthy jailbreak attack framework for mobile vision-language agents operating smartphone interfaces with imperceptible adversarial inputs.
IoT-based wireless sensor network system for industrial monitoring and control using Arduino microcontrollers and NRF transceivers.
SAVANT framework for semantic anomaly detection in autonomous driving using structured reasoning with open-source VLMs.
Contrastive decoding method addressing score range bias in LLM-as-a-judge for reliable evaluation without reference comparisons.
VisCoder2 multi-language visualization coding agent using LLMs with iterative execution and correction for improved practical workflows.
PULSE framework for knowledge transfer from information-rich to deployable sensors in embodied multi-sensory systems.
LoRA-DA framework establishing theoretical foundation for data-aware LoRA initialization using asymptotic analysis for parameter-efficient fine-tuning.
Nirvana specialized generalist model with task-aware memory mechanism combining broad LLM capabilities with domain adaptation.
Neural metrics for speech translation evaluation that incorporate source text information to improve correlation with human judgments.
SpecQuant framework for ultra-low-bit LLM quantization using spectral decomposition and adaptive truncation for efficient device deployment.
Data-efficient fine-tuning method for text-to-video diffusion models using sparse synthetic data to add new generative controls.
DeCo framework for efficient pixel-space image generation using frequency-decoupled diffusion transformers.
Pistachio synthetic benchmark for video anomaly detection with balanced scene diversity and temporal complexity for autonomous systems.
TREASURE foundation model for transaction understanding in payment networks, enabling anomaly detection and consumer insights at scale.
Socratic questioning framework improving VLM understanding of remote sensing images by addressing pseudo-reasoning and incomplete perception issues.
REVEAL framework for detecting AI-generated images with forensic explainability through structured reasoning rather than post-hoc rationalizations.
Analysis of chain-of-thought reasoning in LLMs from optimization lens, addressing overthinking and performance issues in long-CoT prompting.
Adaptive Replay Buffer for offline-to-online reinforcement learning that dynamically balances fixed offline data with new online experiences.
PyFi framework for financial image understanding using vision-language models with adversarial agents and 600K QA dataset organized in reasoning pyramid.
First empirical study of machine unlearning in hybrid quantum-classical neural networks, adapting classical unlearning methods to variational quantum circuits.
Benchmark for evaluating physics-grounded audio in text-to-audio-video generation models.
LLM-assisted framework for identifying security assets in SoC designs to improve pre-silicon security verification.
LLM-based framework for document inconsistency detection with improved evidence extraction capabilities and metrics.
RL-enhanced MLLM approach for high-resolution image quality assessment using context-aware multi-scale visual probing.
Multi-targeted backdoor attack method for graph neural networks using injection-based trigger mechanisms.
Framework virtualizing computer environments as interactive tools to elicit general agentic intelligence capabilities in LLMs.
Space filling curves applied to communication-avoiding matrix multiplication for efficient HPC and deep learning workloads.
Audio LLM-based approach for detecting speech editing and localizing manipulated content without frame-level supervision.
Self-validation framework mitigating object hallucination in Large Vision-Language Models via structured analysis.
Submodular-based data selection for efficient LLM instruction tuning by addressing gradient conflicts in Fisher information.
Open-source library for learning representations and world models using Joint-Embedding Predictive Architectures.
Framework using influence functions to craft training data perturbations that induce targeted behavior changes in models.