Enhancing Foundation VLM Robustness to Missing Modality: Scalable Diffusion for Bi-directional Feature Restoration
Method for improving Vision Language Model robustness when modalities are missing using scalable diffusion-based feature restoration.
Method for improving Vision Language Model robustness when modalities are missing using scalable diffusion-based feature restoration.
Multi-agent LLM framework for discovering instrumental variables in causal inference through interdisciplinary knowledge synthesis.
Voxtral Realtime: natively streaming ASR model achieving sub-second latency with end-to-end training for audio-text alignment.
SSLogic: agentic meta-synthesis framework where LLM agents iteratively create and refine generator-validator pairs for logic reasoning tasks.
KLong: open-source LLM agent trained for extremely long-horizon tasks using trajectory-splitting SFT and progressive RL with Research-Factory pipeline.
AI Runtime Infrastructure layer that observes and optimizes agent execution for task success, latency, token efficiency, and safety.
DeepFact benchmark and co-evolving agent system for testing factuality of search-augmented LLM-generated research reports.
HECG framework for autonomous agents using LLMs with multi-dimensional error correction and strategy transfer across tasks.
Study showing that deliberation between multiple LLMs can amplify tiny perturbations into divergent decisions, challenging robustness assumptions.
Machine learning framework for automating defect detection in photovoltaic systems using electroluminescence imaging.
Proposes alternative training architecture for geometric and neuromorphic AI using non-standard arithmetic to reduce memory overhead.
Conceptual framework for AI governance addressing regulatory gaps between task-specific systems and foundation models.
Voxtral TTS expressive multilingual text-to-speech model generating natural speech from minimal reference audio.
Metriplector neural architecture primitive based on field theory where input configures abstract physical systems.
ClawSafety exposes security vulnerabilities in local LLM agent frameworks where prompt injection enables privilege escalation.
AgentSocialBench evaluates privacy risks in collaborative multi-agent social networks with persistent LLM agents.
Modal framework for knowledge representation handling domain-specific concept meaning shifts in knowledge graphs.
XpertBench evaluates LLM performance on expert-level open-ended tasks with rubrics-based assessment.
Addresses value hallucination in Dyna reinforcement learning agents through multistep predecessor models.
VLBiasBench evaluates biases in large vision-language models across diverse domains and question formats.
Study of app metamorphosis phenomenon where mobile apps undergo significant market repositioning.
MegaFake dataset of LLM-generated fake news for understanding mechanisms behind AI-generated misinformation.
SPRIG optimizes system prompts for LLMs using genetic algorithms to improve general task performance.
Comprehensive survey of document parsing techniques for extracting structured information from unstructured documents.
Certified Training with Branch-and-Bound for learning verifiably stable neural control systems.
RIRS framework for multi-agent RAG systems to route complex questions across distributed knowledge bases.
Human-AI collaboration for game testing using vision language models to enhance manual testing efficiency.
Framework for statistical inference on detected changepoints in sequential analysis with confidence sets.
Review of anomaly detection techniques for cyber-physical systems security in critical infrastructure.
Reasoning Model Implicit Association Test studies implicit bias-like patterns in LLMs that use step-by-step reasoning.
BalancedDPO method aligns diffusion models with multiple conflicting evaluation metrics for text-to-image generation.
Open-source benchmark for 3D chip design using OpenROAD framework, evaluates power, performance, area, and thermal metrics.
Investigates alignment of causal attribution scores (Shapley, Banzhaf, Causal Responsibility) for database tuple relevance in data management.
RaPA improves transferable targeted adversarial attacks by identifying and pruning redundant surrogate model parameters.
Online test-time adaptation method for spiking neural networks via threshold modulation, enabling edge deployment with distribution shift handling.
FSD bridges reasoning and decision-making in robotic manipulation by combining Vision-Language Models with action prediction for zero-shot generalization.
Bayesian ablation framework for interpreting latent task representations in neural networks, enabling probabilistic analysis of learned representations.
VERDI uses Vision-Language Models embedded in autonomous driving stack for reasoning-based trajectory planning under partial observability.
Chapter reviewing ML/AI applications in food processing, covering classification frameworks and data science approaches to food informatics.
SoSBench evaluates LLM safety alignment across six scientific domains with sophisticated, knowledge-intensive adversarial prompts.
Framework for evaluating LLM judges of LLM outputs, accounting for both sampling and judge quality uncertainty without gold-standard scores.
K-Steering enables unified multi-attribute control of LLM behavior at inference time using non-linear steering on hidden activations.
PhysGaia benchmark for dynamic novel view synthesis with physics-aware evaluation of multi-body interactions and realistic collisions.
LLMs applied to combinatorial optimization of Design Structure Matrices in engineering, demonstrating reasoning capabilities for complex system reorganization.
ZINA detects and edits fine-grained hallucinations in multimodal LLMs, proposing a novel evaluation task for MLLM quality.
Vision Transformer-based framework reconstructs multispectral satellite imagery obscured by clouds using SAR data for crop mapping.
PRISM: lightweight fully convolutional model for multivariate time-series classification on edge devices.
Framework treating prompts as first-class citizens in LLM pipelines to enable reuse, optimization, and runtime adaptation in complex agent systems.
CATNet applies geometric deep learning (R-GCN) to catastrophe bond spread prediction in financial markets.
Embodied-R1 introduces a 3B VLM using "pointing" as unified intermediate representation to address the seeing-to-doing gap in robotic manipulation across different embodiments.