Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems
Multi-agent system with Lean 4 verification layer for exact scientific discovery in quantum code design, combining symbolic synthesis and automated verification.
Multi-agent system with Lean 4 verification layer for exact scientific discovery in quantum code design, combining symbolic synthesis and automated verification.
ATLAS framework combines LLMs with model-driven workflows for generating structured artifacts that satisfy schemas, domain rules, and audit requirements through constraint compilation and validation.
Interpretable model for detecting implicit and explicit hate speech using prototype-based representations for transfer learning.
Research on financial fraud risks from collaborative LLM agents including MultiAgentFraudBench for simulating multi-agent fraud scenarios.
Fairness-aware stroke diagnosis framework combining domain-adversarial training with group distributionally robust optimization.
Synthetic environment generating visual reasoning puzzles with ground-truth solutions across 25 task types for benchmark construction.
Video compression framework using semantic conditioning and diffusion models for ultra-low bitrate encoding.
Benchmark for task-oriented spatio-temporal grounding in egocentric videos for embodied AI agents.
Physics-informed transformer model for socially-aware autonomous driving that learns social interaction dynamics.
Drag-based image editing method using diffusion models with token injection and attention mechanisms for precise visual manipulation.
Theoretical framework for analyzing causal effects at fine-grained levels in high-dimensional data like images and language models.
Interface design study on scaffolding divergent and convergent thinking in human-AI co-creation with generative models.
SWE-EVO benchmark for evaluating AI coding agents on long-horizon software evolution tasks spanning multiple files and iterations.
Research on using LLMs to generate multilingual counterfactual examples for model interpretability across languages.
LLM-based method for categorical data clustering that leverages semantic understanding to measure similarity among attribute values lacking inherent ordering.
Text-driven video reauthoring interface and study exploring how creators can edit video footage through natural language prompts rather than manual editing.
Analysis of fairness in automated decision-making for healthcare emergency triage using process mining and fairness-aware algorithms on empirical data.
Vision-language agent framework combining inverse graphics with interleaved multimodal reasoning for reconstructing images into editable programs with spatial grounding.
Large-scale empirical study analyzing how AI coding agents modify code and describe changes in GitHub pull requests compared to human contributions.
Open-source web platform and course teaching machine learning fundamentals to students aged 12-17 using LEGO robotics without programming.
Method improving language model pretraining by using post-trained models as data sources to instill desired behaviors like safety and reasoning earlier in training.
Few-shot fine-tuned LLM approach for categorizing intermittent CI pipeline failures caused by flaky tests and infrastructure issues rather than code defects.
Information-theoretic framework for optimizing shared visual tokenization in unified multimodal models that perform both image understanding and generation.
Safety alignment approach for Mixture-of-Experts language models addressing unique challenges from sparse routing mechanisms during fine-tuning.
Benchmark framework for evaluating multimodal large language models on spatio-temporal bimanual coordination tasks requiring synchronized multi-stream integration.
Method using linear probes on LLM pre-generation activations to predict success likelihood before generation, enabling selective deployment of expensive extended reasoning.
Study examining emergent social behavior and interactions among large-scale communities of AI agents on MoltBook, a social platform designed for agent-agent communication.
Token-level noise filtering framework for LLM fine-tuning datasets that identifies and explains problematic tokens to improve downstream task performance.
Language model based on continuous flows over token embeddings demonstrating faster generation than discrete diffusion and autoregressive models with improved few-step quality.
Open-source framework unifying rubric-based LLM evaluation techniques including ensemble judging, bias mitigation, and few-shot calibration with consistent implementation.
Research on human-AI agent collaboration exploring how agents can maintain workspace awareness and interpret concurrent user actions on shared artifacts during co-creative tasks.
Research on multi-agent reinforcement learning algorithm (NePPO) addressing training stability and convergence in general-sum games with heterogeneous agents.
PlayWorld pipeline training action-conditioned video models on autonomous robot play data for improved world model physics prediction.
Five prompt engineering strategies to reduce LLM hallucinations and improve consistency in industrial applications like design and IoT.
Human-LLM collaboration developing structural framework for Collatz map dynamics with theoretical proofs.
Hindsight-anchored policy optimization method addressing advantage collapse in sparse-reward RL for reasoning model post-training.
UtilityMax Prompting framework using formal mathematical language to specify multi-objective LLM tasks with influence diagrams.
Controlled experiments showing LMs prefer correct answers because error compressibility structure guides learning, not inherent truth preference.
Perplexity's recommendations on security considerations for frontier AI agents based on operating agentic systems at scale.
Quality diversity optimization for red-teaming vision-language-action robot models to improve robustness against prompt variations.
Brittlebench framework quantifying LLM robustness through prompt sensitivity evaluation beyond static benchmarks.
Contextual data fusion framework integrating vehicle sensors with environmental signals for predictive maintenance in connected vehicles.
Generates then corrects predictions for aspect sentiment quad prediction in fine-grained opinion mining tasks.
Proposes parallel framework combining imitation and reinforcement learning for end-to-end autonomous driving instead of sequential fine-tuning.
Studies causal discovery in chain-reaction dynamical systems using interventional data with identifiability guarantees.
Philosophical analysis of moral dimensions in human-AI companion interactions and provider control structures.
Framework using LLMs to automatically synthesize reward programs for cooperative multi-agent reinforcement learning systems.
Combines flow matching with reward optimization for trajectory forecasting in autonomous driving and crowd surveillance scenarios.
Proposes multimodal deception detection dataset using GSR-guided distillation to improve non-contact deception detection.
Introduces StackRepoQA, a repository-level QA benchmark for evaluating LLMs on multi-file program comprehension tasks beyond isolated code snippets.