Learning Stable Predictors from Weak Supervision under Distribution Shift
Studies learning from weak supervision under distribution shift in CRISPR-Cas13d experiments where guidance efficacy is indirectly inferred.
Studies learning from weak supervision under distribution shift in CRISPR-Cas13d experiments where guidance efficacy is indirectly inferred.
EduIllustrate benchmark evaluates LLMs on generating multimodal educational content combining accurate diagrams with step-by-step explanations.
BDATP framework for audio-visual navigation using binaural attention and action prediction to improve generalization in unseen 3D environments.
YMIR dataset and CNN model for classifying five Yemeni music genres, addressing underrepresentation of non-Western music in MIR research.
Comparative analysis of key-value cache management strategies for efficient LLM inference under different model sizes and context lengths.
Proposes training LLM coding agents on five atomic coding skills (localization, editing, testing, reproduction, review) for improved generalization.
StarVLA provides a modular open-source codebase for building vision-language-action embodied agents with standardized evaluation protocols.
Phase-Associative Memory is a recurrent sequence model using complex-valued representations achieving competitive perplexity on WikiText-103.
ID-Sim proposes an identity-focused similarity metric for vision models to improve evaluation of personalized image generation tasks.
PCA-Triage is a streaming algorithm for adaptive sensor sampling in IoT networks using principal component analysis to manage bandwidth constraints.
Study evaluating LLM sensitivity to prompt phrasing in medical question answering, showing inconsistent responses despite identical underlying evidence.
DynLMC generates synthetic multivariate time series with time-varying correlations and cross-channel dependencies for training foundation models.
arXiv paper presenting AutoLALA, open-source tool analyzing data locality in loop programs for HPC and AI workloads.
arXiv paper on privacy-preserving graph learning for additive manufacturing sensor data using differential privacy techniques.
arXiv paper on Nidus, a governance runtime using Claude, Gemini, Codex to mechanize V-model for AI-assisted software delivery.
arXiv paper proposing OmniScore, deterministic evaluation metrics for multilingual text generation as alternative to LLM judges.
arXiv paper auditing code-editing benchmarks for LLMs, finding flaws in existing evaluation methods for instructed code modification.
arXiv paper on diffusion models for medical imaging, generating paired mammogram views for cancer screening datasets.
arXiv paper on Decision Pre-Trained Transformer for in-context reinforcement learning, enabling scalable generalist agent training.
arXiv paper on CRAB method for mitigating popularity bias in generative recommendation systems via codebook rebalancing.
arXiv paper presenting π² pipeline for curating reasoning data from structured sources to improve LLM long-context reasoning.
arXiv paper on vision-language models learning from grounded video data, finding text-only bias in video benchmarks.
arXiv paper modeling prior authorization policy retrieval as MDP for adaptive decision-making in healthcare insurance.
arXiv paper on how reasoning evolves in language models through fine-tuning and RL, studied via chess task performance.
EffiPair: Relative Contrastive Feedback method for improving runtime and memory efficiency of LLM-generated code without model fine-tuning.
Compiled AI: Paradigm where LLMs generate executable code during compilation for deterministic, model-free workflow automation execution.
Planning to Explore: Curiosity-driven planning approach for LLM-based test generation using Bayesian principles to reach deep code branches.
Analysis of 10 proposed measures for evaluating qualitative interview response quality to determine predictive validity.
Adaptive Thinking Budgets: Method for allocating inference-time compute efficiently across multi-turn LLM reasoning based on turn difficulty.
Modality-aware vector-quantized VAE for reconstructing multimodal brain MRI data across different imaging modalities.
Large Sparse Reconstruction Model studies scaling transformer context windows for improved 3D object reconstruction from multiple views.
OrthoFuse: Training-free method for merging multiple adapters in diffusion models using Riemannian geometry.
Study comparing encoder and decoder-based LLMs for screening clinical narratives to automate patient recruitment for clinical trials.
RoboPlayground: Framework for democratizing robotic manipulation evaluation through structured physical domain benchmarks.
Optimization strategies using curvature-aware methods to improve convergence speed and accuracy of physics-informed neural networks.
XMark: Multi-bit watermarking method for embedding imperceptible messages in LLM-generated text for attribution and tracing.
Study on how transformer language models learn second-order generalizations about object categories from synthetic data.
Temporal extension of TabDDPM for time-series data generation, addressing temporal dependencies in diffusion-based synthetic data creation.
Region-based re-ranker for multi-modal RAG reducing visual distractors by formulating region selection as decision-making problem.
Multi-agent spec-driven development pipeline with context-grounding hooks to prevent hallucinations and architectural violations in LLM coding agents.
Formal verification of security vulnerabilities in AI-generated code across 7 frontier LLMs and 500 prompts using Z3 SMT solver.
Study on training LLMs to express uncertainty explicitly as control interface for abstention and verification tasks.
Novel autoregressive paradigm for long-sequence symbolic music generation using anchored cyclic generation.
Diagnostic RAG system for IT support with explicit diagnostic state tracking across turns to accumulate evidence and resolve hypotheses.
Multi-agent LLM system for clinician-in-the-loop gait analysis report drafting, coordinating specialized agents for multimodal data synthesis.
Training-free quantization method for 3D reconstruction models using random rotations without per-scene fine-tuning.
Study on AI's role in collective decision-making systems and procedural legitimacy conditions for participants.
Long video understanding via spatio-temporally structured intent-aware RAG, preserving video structure while retrieving query-relevant evidence.
System for adaptive LoRA hyperparameter tuning and orchestration across heterogeneous multi-tenant LLM fine-tuning workloads.
Open-source digital twin simulator integrating natural language with renewable energy microgrid dynamics and dataset.