On the Step Length Confounding in LLM Reasoning Data Selection
Analysis of step length confounding bias in LLM reasoning dataset selection pipelines used for fine-tuning complex reasoning models on chain-of-thought tasks.
Analysis of step length confounding bias in LLM reasoning dataset selection pipelines used for fine-tuning complex reasoning models on chain-of-thought tasks.
Memory architecture for long-term dialogue systems using boundary-guided event segmentation and query-adaptive retrieval to improve scalability and personalization.
Benchmark for evaluating LLM diagnostic robustness in medical dialogue with adversarial patient behaviors at varying severity levels and cross-dimension interactions.
Large-scale study examining bias in skin-toned emoji representations across LLMs and embedding models, addressing societal bias perpetuation in AI systems.
Research on physical adversarial attacks against surveillance systems including person detection, multi-object tracking, and visible-infrared evasion techniques.
Research on redundancy in Large Speech Language Models reveals structured token-level redundancy to reduce inference costs while maintaining semantic fidelity.
SentinelSphere combines ML-based threat detection with LLM-powered security training to address cybersecurity skill gaps and human vulnerabilities.
Extended reality platform combining XR and multimodal AI for personalized career guidance and coaching.
Benchmark measuring occupational skill susceptibility to LLM automation across 263 tasks and 35 O*NET skill categories.
Query-aware adaptive perception method for reducing tokens in multimodal LLM inference while maintaining fine-grained understanding.
Efficient scaling method for diffusion model reinforcement learning using mixed precision and selective rollout quantization.
Multi-modal UI control detection combining YOLO vision with GPT-generated text descriptions via cross-attention.
Neural improvement method learning local search policies for TSP, generalizing beyond single-solution outputs.
Empirical study of LoRA fine-tuning LLMs for automated test case generation from natural language requirements.
Multimodal wearable framework for frailty estimation in elderly breast cancer patients.
Adversarial patch attacks on palmprint recognition systems in physical settings.
Generative approach to photomosaic creation using diffusion models with structure alignment.
Machine learning approach for stress estimation in elderly cancer patients using multimodal wearable sensor data.
Study of self-preference bias in LLM-as-judge evaluation, showing models favor outputs from themselves or related models.
Framework for governing autonomous AI agent economies through constitutional separation of powers architecture.
Multi-agent LLM simulation framework for legal argumentation with trait-conditioned agents in adversarial game-theoretic setting.
Training-free method using vision-language models to analyze robot execution failures via keyframe tokenization.
Single-agent robotic architecture with modular capabilities for unified intelligence organization and execution control.
Agentic approach using LLM agents to decompose complex text-to-SQL queries into simpler steps for multi-table reasoning.
Neural motion planning for robotic manipulators using flow matching models for open-loop trajectory generation.
Framework for empathetic dialogue systems using strategy-aware multi-stage reasoning and LLM-based response generation.
Dataset for detecting and localizing AI-generated forgeries in surveillance imagery to address deepfake risks.
Study on persona vector steering for personalizing LLM outputs in educational settings; shows reduced answer quality on open-ended tasks.
Informational Buildup Framework addressing catastrophic forgetting in continual learning through alternative parameter storage mechanisms.
Analysis of onboard Earth observation processing for satellite imagery with reduced latency and bandwidth constraints.
Framework for dynamically organizing and managing context in multi-turn human-AI collaboration workflows with hierarchical structure awareness.
Transformer-based network for vehicle trajectory prediction in autonomous driving using multi-modal data without explicit graph structures.
Privacy-preserving structural dataset of child sexual abuse imagery as graphs for computer vision research.
Multi-agent deep reinforcement learning algorithm for energy optimization in cell-free massive MIMO networks.
Framework addressing cross-batch mode collapse in LLM synthetic data generation through Dynamic Context Evolution.
Subspace decomposition framework for multimodal MRI and PET imaging fusion with orthogonal representation separation.
Geometric framework for longitudinal MRI analysis using energy-based implicit neural representations.
Deep learning framework for tea leaf disease classification using CNN models.
Comprehensive ecosystem analysis of ~1.5K open language models, documenting adoption trends and builders of leading models.
Statistical method for mixture proportion estimation and conditional independence testing in weakly supervised learning.
Benchmark assessing safety guardrails for LLMs in multi-step tool-calling agent trajectories, introducing TraceSafe-Bench.
Automated discovery benchmark for mathematical problems based on the k-server conjecture using code-based challenge.
Research on designing safe and accountable generative AI as learning companion for women in surveillance-restricted contexts.
System translating natural language operator intents into routing constraints for LEO satellite networks using GNN and LLM components.
Systematic study analyzing retrieval pipeline components for retrieval-augmented generation in medical question answering systems.
Integration of DeePMD-kit neural network potentials into GROMACS for GPU-accelerated molecular dynamics simulations.
Framework improving online reinforcement learning efficiency for Android agents by enabling multiple actions per state.
Adaptive system for autonomous vehicles that dynamically scales neural network computational complexity based on context.
Machine learning framework using Mixture-of-Experts for whole-slide image classification in computational pathology.
Research on using conversational LLMs for automated programming assessment to evaluate student code understanding beyond functional correctness.