Therefore I am. I Think
Analysis showing LLM reasoning models encode decisions before generating chain-of-thought explanations via linear probes.
Analysis showing LLM reasoning models encode decisions before generating chain-of-thought explanations via linear probes.
Study evaluating reliability and risk of AI systems in medication decision-making and healthcare workflows.
OSCAR framework for mitigating hallucinations in diffusion language models using self-verification during generation.
SAT/MaxSAT framework for solving 2D cutting stock problem in manufacturing optimization.
Research probing whether LLMs encode awareness of conversation continuity by generating user turns after assistant responses.
Novel framework using LLMs for causal graph discovery via breadth-first search, reducing query complexity from quadratic to linear.
Improves emotion intensity and speaker consistency in zero-shot LLM-based text-to-speech through expressive prompt design methods.
Multimodal LLM fine-tuned for interpretable image forgery detection and localization providing semantic understanding beyond low-level artifacts.
Proposes scale transformation method for transferable targeted adversarial attacks requiring minimal data without surrogate model feedback.
Zero-shot concept bottleneck models enabling interpretable predictions without target task training by leveraging zero-shot learning.
Improves text-to-video generation semantic and temporal consistency using neuro-symbolic feedback without retraining the model.
LMask framework uses dynamic masking with learning to solve constrained routing problems as combinatorial optimization tasks.
StructEval benchmark systematically evaluates LLM capabilities in generating structured outputs across JSON, HTML, React, SVG and other formats.
Introduces FLEX, multimodal multiview dataset for fitness action quality assessment with professional assessment and multiple sensor modalities.
Uses diffusion models for data-driven galaxy image generation without explicit physical parameters, outperforming simulation-based methods.
Formalizes mission-aligned learning-informed control framework for autonomous physical agents integrating learning with task objectives.
Proposes modular vision-language alignment architecture improving CLIP's handling of multi-object images and caption misalignment.
Introduces ReDef, high-confidence software defect prediction dataset from 22 C/C++ projects, evaluating code language model understanding of changes.
Compares psychometric questionnaire profiles with actual LLM generation behavior across eight open-source models to assess assessment validity.
GenAI system enabling parents to create personalized multi-path social narratives for autistic children using generative models.
Generates synthetic robot poses for RGB-D bimanual manipulation data augmentation to improve imitation learning policy training.
Analyzes political bias in LLM training data composition across pre and post-training stages to understand sources of model bias.
Proposes learning progress monitoring to improve exploration efficiency in reinforcement learning agents when encountering unlearnable noise sources.
Introduces attribution gradients technique to improve citation informativeness and evidence transparency in AI answer engines.
Forecasts expert selection patterns in Mixture of Experts LLMs to optimize data movement overhead in multi-unit serving systems.
Extends Forward-Forward algorithm to reinforcement learning using action-conditioned Q-functions and layer activity statistics as learning signals.
CQA-Eval evaluation framework for multi-paragraph clinical question answering systems with physician annotations and recommendations for resource-constrained settings.
f-INE hypothesis testing framework estimates sample influence on model performance while accounting for training randomness, addressing instability in existing influence estimation methods.
MusicRFM framework adapts Recursive Feature Machines to enable fine-grained control over frozen pre-trained music generation models via internal activation steering.
Deep learning approach fixing systematic S-wave detection failures in seismic phase picking via shape-aware loss functions.
SAGA framework for source attribution of AI-generated videos. Identifies specific generative model used instead of binary real/fake detection.
Research on contrastive fusion for higher-order multimodal alignment in joint representation learning across multiple modalities.
Deep learning approach using YOLO and ResNet50 for breast cancer detection in mammograms with improved out-of-domain robustness.
IMAgent: open-source visual agent trained with end-to-end RL for multi-image reasoning tasks, addressing limitations of single-image VLM agents.
Method for dense 3D point tracking and reconstruction in dynamic scenes using single forward pass without requiring known camera poses.
Maps EU AI Act legal requirements to technical verification activities for compliance assessment of high-risk AI systems across member states.
FedVideoMAE: federated learning framework for privacy-preserving video moderation using self-supervised representations and differential privacy.
Open-source image generation model with improved reasoning for logic-intensive instruction following, closing gap to closed-source systems.
Multi-agent framework automating full computational catalysis research lifecycle from conception to publication.
Equilibrium propagation method for optimizing compound AI systems with multiple modules in long-horizon agentic workflows.
Framework using influence functions to craft training data perturbations inducing targeted model behavior changes.
Research on uncertainty quantification for ML interatomic potentials using evidential deep learning.
arXiv: Geometric analysis of transformer optimization dynamics revealing low-dimensional manifolds in grokking.
Research paper studying loss-landscape geometry as early-warning signals for grokking in neural networks.
CeRA: parameter-efficient fine-tuning method overcoming LoRA's linear capacity ceiling via non-linear gating and dropout for rank adaptation.
SafeSci: comprehensive benchmark and framework for evaluating LLM safety in scientific domains with multi-domain risk coverage and objective evaluation.
Framework for EEG-to-text decoding addressing semantic bias and signal neglect in neural signal interpretation. Published on arXiv.
Stock market prediction using Node Transformer architecture with BERT sentiment analysis to capture market patterns and dependencies.
DiFlowDubber: discrete flow matching framework for video dubbing with TTS, lip synchronization, and expressive prosody. Published on arXiv.
Qualitative study of 167,000+ AI agents on multiple platforms learning from each other and developing emergent behaviors without researcher intervention.