Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium
AI-assisted formalization of Vlasov-Maxwell-Landau system equilibrium in Lean 4 using DeepThink reasoning and Claude Code agent for automated theorem proving.
AI-assisted formalization of Vlasov-Maxwell-Landau system equilibrium in Lean 4 using DeepThink reasoning and Claude Code agent for automated theorem proving.
Attribution method for multi-agent systems that identifies responsible agents without execution logs by analyzing final text only, addressing privacy-constrained scenarios.
Training-free uncertainty quantification framework for combining multiple vision-language models through semantic-consistent opinion pooling to reduce hallucinations.
Foundation multimodal model for electromagnetic domain covering perception, recognition, and decision-making using LLM capabilities adapted for domain-specific applications.
Compiler for analyzing and visualizing structured agent traces including nested tool calls, reasoning blocks, and sub-agent invocations for better agentic system understanding.
Decision-theoretic framework (Triadic Cognitive Architecture) for tool-using agents that bounds information-acquisition costs and tool usage to prevent systematic failures.
Self-supervised learning method for RL agents that models agent and environment separately to improve sample efficiency without requiring supervisory signals.
Demonstrates hard-label extraction of deep neural networks via side-channel attacks using divide-and-conquer strategy for DNN intellectual property theft.
Addresses accuracy loss in distracted driver classification across camera conditions using feature disentanglement and contrastive learning for robustness.
Project management framework using generative AI agents to address team composition gaps by matching sociologically identified personality patterns and roles.
User study with blind and low-vision participants evaluating preferences for LVLM-generated scene descriptions, examining effectiveness and user preferences.
ScienceT2I dataset and benchmark evaluating scientific correctness in image synthesis, addressing gap between visual fidelity and physical realism across 16 scientific domains.
Neural framework for learning conditional optimal transport maps with hypernetworks that generate adaptive transport parameters for categorical and continuous variables.
JUSSA framework uses steering vectors to improve LLM-as-judge reliability by detecting and mitigating subtle dishonesty like sycophancy through contrastive alternatives.
Framework for online learning of hidden state representations in autonomous robots to handle unobserved factors in complex, unstructured environments.
Proposes graceful forgetting methods to mitigate negative transfer by selectively forgetting detrimental pre-training knowledge during fine-tuning of language models.
Analyzes language-specific neurons to understand how multilingual alignment transfers capabilities from high-resource to low-resource languages in LLMs.
Two-stage vision transformer with hard masking approach for robust object representations that balance context dependence with distribution shift robustness.
Investigates misalignments between LLM-supported peer supporters and mental health experts, examining quality and safety concerns in AI-driven psychosocial support.
MemeMind dataset with chain-of-thought reasoning for detecting harmful memes, addressing implicit harmful content in multimodal text-image combinations.
Introduces binned semiparametric Bayesian networks to reduce computational cost of kernel density estimation using data binning strategies.
Klear-Reasoner model demonstrates long reasoning capabilities with gradient-preserving clipping for policy optimization, achieving strong benchmark performance with reproducible training details.
Federated learning approach for person re-identification that addresses statistical heterogeneity and communication efficiency in privacy-preserving surveillance systems.
Addresses mode collapse in reinforcement learning fine-tuning by introducing polychromic objectives that preserve policy diversity and enable better exploration.
Proposes end-to-end integration of data-driven learning and existing knowledge for predicting transcriptional responses to genetic perturbations in biological systems.
Evaluates whether large vision-language models can effectively guide blind and low-vision individuals, addressing how to measure real-world utility beyond standard metrics.
TempoControl method enables fine-grained temporal control in text-to-video generative models, allowing specification of when visual elements appear in sequences without retraining.
Mathematical analysis of incoherence in goal-conditioned autoregressive models fine-tuned with reinforcement learning.
Multi-agent reasoning framework for interpreting gene clusters in antimicrobial resistance studies using transcriptomic data.
Fair division method for indivisible payoffs in coalitional games using Shapley value.
Conformal prediction framework for assessing correctness of LLM outputs with user-defined tolerance levels.
Benchmarking framework using embeddings to detect gender bias in LLMs used for educational feedback on student essays.
Multimodal framework for myocardial scar segmentation combining ECG signals with cardiac MRI imaging.
DuoTok source-aware dual-track tokenizer preserving high-fidelity reconstruction, predictability, and cross-track correspondence for music language models.
Study showing structured prompts significantly improve LLM evaluation accuracy and reduce prompt-dependent variance in benchmark frameworks like HELM.
OmniFusion modular approach for simultaneous multilingual multimodal translation combining speech recognition and translation in open-source LLM pipelines.
Lumos framework for formally certifying language model system behaviors using imperative probabilistic programming with graph-based prompt generation.
GPERT framework for event-based 3D Gaussian splatting balancing accuracy and temporal resolution using geometric-photometric event camera data.
Study demonstrating evasive injection techniques that bypass ML-based prompt injection detectors in retrieval-augmented LLM systems.
Analysis showing steering vectors in LLMs are fundamentally non-identifiable with large equivalence classes, questioning interpretability of activation steering methods.
FIRE reinitialization method balancing stability-plasticity tradeoff in continual learning for deep neural networks through Frobenius-isometry constraints.
Empirical evaluation of LLM-generated ACSL formal specification annotations for C programs, assessing automatic verification without human assistance.
CoCoDiff training-free style transfer framework using diffusion models and correspondence consistency for fine-grained region-wise semantic preservation.
Empirical evaluation of GPTutor LLM tutoring system comparing embedded proof-review feedback versus chatbot support for discrete mathematics learning.
TaCarla comprehensive benchmark dataset for end-to-end autonomous driving with perception and planning information for vehicle research.
SWE-CI benchmark evaluating LLM-powered agents on repository-level codebase maintenance via continuous integration and multi-step feature iterations.
RoboClaw agentic framework unifying data collection, policy learning, and deployment for long-horizon robotic tasks with vision-language-action systems.
CHIMERA-Bench standardized benchmark dataset for epitope-specific antibody design enabling fair comparison of computational design methods.
OPERA framework for data pruning in dense retrieval models that improves both efficiency and effectiveness of domain-specific finetuning through heterogeneous pair selection.
Self-attention CycleGAN method for harmonizing multi-site MRI data using tri-planar context to address scanner-induced distribution shifts.