MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control
MMEmb-R1 incorporates chain-of-thought reasoning into multimodal embeddings with pair-aware selection and adaptive control mechanisms.
MMEmb-R1 incorporates chain-of-thought reasoning into multimodal embeddings with pair-aware selection and adaptive control mechanisms.
Diffusion model approach for converting low dynamic range video to HDR through scene radiance estimation.
Test-time training method updates LLM fast weights at inference to adapt dynamically to new information streams.
UserCentrix is a hybrid agentic orchestration framework for smart spaces combining memory augmentation with multi-agent coordination.
ARIEL framework pairs expert-vetted biomedical tasks with LLMs for evaluation and optimization of AI research assistants.
Fine-tunes open-source LLMs for smartphone app control by learning action semantics rather than syntax, reducing API costs.
URSA framework enables LLMs to conduct autonomous research through complex reasoning, planning, coding, and multi-agent collaboration.
MedGemma is a medical vision-language foundation model collection designed for healthcare AI tasks with privacy preservation.
Agent-based model framework for simulating cascading climate risks in supply chains with adaptive firm behavior and economic network effects.
Extends Nash learning from human feedback to multiplayer setting, addressing non-transitive and heterogeneous preference capture in LLM alignment.
DeepSearch applies Monte Carlo Tree Search to overcome training plateaus in reinforcement learning from verifiable rewards for language model reasoning.
Introduces Supervised Multi-Dimensional Scaling to analyze and compare feature manifold hypotheses in language models' latent spaces.
TS-Agent enables LLMs to reason over raw time series data directly without converting to text/images, reducing hallucination and knowledge leakage.
DRIFT method automates mathematical theorem formalization for LLMs by decomposing statements and retrieving prerequisite knowledge in formal languages.
Critiques rule-based and reward-based approaches in RL ethics, proposes virtue ethics framework for more robust machine ethics.
Information-theoretic analysis extending Gödel's incompleteness to AI security and alignment, establishing fundamental limitations for robust AI systems.
Framework enabling GUI agents to build actionable memory from past tasks via self-exploration with critic guidance, improving generalization and reducing errors.
Asynchronous reinforcement learning framework for vision-language-action model training, enabling flexible post-training optimization for embodied agents.
Study demonstrating that introspection mechanisms in LLMs are content-agnostic, detecting anomalies without understanding their semantic meaning.
Framework adapting hindsight experience replay to recover training signal from failed LLM agent trajectories, addressing low real-world task success rates.
Diffusion-based surgical video restoration framework using physics and semantics-guided reinforcement learning to remove surgical smoke.
Two-phase training framework jointly optimizing LLMs for reasoning and self-refinement using group relative policy optimization on correctness rewards.
High-fidelity benchmark with rubrics-based evaluation assessing LLMs on expert-level complex open-ended tasks across multiple domains.
LLM-based peer review system that verifies claims by checking related work and executing code, improving review quality beyond manuscript-only analysis.
Theoretical framework for evaluating cyclic non-transitive interactions between LLM-based agents using equilibrium concepts instead of linear rankings.
Framework proposing that ambient AI systems transition from modeling to constituting users' cognitive functions through sustained causal coupling.
Molecular discovery framework combining LLMs with diffusion models to improve generation of chemically valid molecules by relaxing autoregressive constraints.
Memory system for deep research agents that improves trajectory retrieval and memory evolution to enhance LLM reasoning and autonomous learning.
Unsupervised fine-tuning method to improve adversarial robustness and semantic quality of vision-language models through siamese contrastive learning.
LLM-based code translation agent using execution alignment to improve cross-language code generation without parallel training data.
Multimodal LLM fine-tuned for image forgery detection and localization with interpretable visual reasoning capabilities.
Divide-and-conquer proof synthesis approach using LLMs to automate formal verification in proof assistants like Coq, improving software quality verification.
Systematic analysis of challenges in transitioning foundation model systems from demos to production, covering reliability, cost, scalability, and compliance issues.
Edge-cloud collaborative VQA system using aligned vector quantization to split vision-language model computation between edge and cloud devices, reducing bandwidth and utilizing edge resources.
Retrieval-augmented generation applied to time-series foundation models for zero-shot forecasting across domains.
VarDrop reduces computational cost in multivariate time series forecasting by eliminating variate token redundancy.
ENTER system uses event graphs for interpretable Video QA with code generation and contextual reasoning.
Entropy-based framework with Transformer for next activity prediction in business process monitoring.
LongSpec enables efficient speculative decoding for long-context LLM inference with lossless acceleration for agent applications.
Framework measuring hedging and non-affirmation behaviors in LLM responses on human rights topics across identity groups.
NativQA framework extends to multimodality for culturally-grounded LLM/VLM evaluation across languages and regions.
LLM-aided tool automates Universal Verification Methodology testbench generation for RTL IC verification.
CMP-RT diagnostic probe reveals tokenization vulnerabilities in safety-aligned LLMs through phonetic perturbations.
Polar decomposition and matrix sign methods optimized for GPU-friendly deep learning training via Muon optimizer.
Multimodal diffusion models synthesize quantum circuits for efficient compilation with reduced hardware calls and runtimes.
HeartcareGPT suite with 400K ECG dataset enables multimodal medical LLMs for dual signal-image ECG understanding.
BulletGen reconstructs 4D dynamic scenes from monocular video using generative models to complete unseen regions.
Survey of continual reinforcement learning covering sequential decision-making, generalization, and adaptation across dynamic tasks.
LaSM defends GUI agents against pop-up injection attacks using layer-wise scaling on multimodal LLMs for safer screen interaction.
Framework for detecting LLM hallucinations in black-box generators by leveraging future context patterns.