ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference
ShadowNPU system co-design for efficient on-device LLM inference on NPUs, addressing quantization sensitivity in attention operators.
ShadowNPU system co-design for efficient on-device LLM inference on NPUs, addressing quantization sensitivity in attention operators.
Benchmarking study of deep learning segmentation models for carotid artery structures in histopathological images with limited datasets.
DoubleAgents system for human-agent alignment in coordination tasks using a coordination agent and dashboard for preference elicitation and feedback.
Neural-MedBench reasoning-intensive benchmark for evaluating clinical reasoning ability of vision-language models beyond classification accuracy.
Vid-Freeze defense mechanism against malicious image-to-video generation using temporal freezing adversarial techniques.
MedIRT psychometric framework for evaluating LLM medical competency rather than benchmark-specific performance using Item Response Theory.
ACT system combines decision trees with LLMs to provide transparent, interpretable, and auditable AI decisions on unstructured data.
Study of how autonomy levels in LLM agents affect user privacy concerns and trust, with implications for personalization design.
FURINA-Builder multi-agent pipeline for automatically constructing customizable role-playing benchmarks at scale for evaluating LLM agent behavior.
Security analysis of LLM pruning methods showing vulnerabilities in popular inference engines like vLLM when models are pruned before deployment.
Survey of image and video restoration techniques for adverse weather conditions in intelligent transportation systems and autonomous driving.
IoT and wireless sensor networks for industrial monitoring and control using NRF transceivers and Arduino microcontrollers.
Watermarking technique for LLMs using syntactic predictability to balance text quality against detection robustness for governance and trustworthiness.
XModBench benchmark measures cross-modal consistency and modality-specific biases in omni-modal large language models across audio, vision, and text.
Game-theoretic framework for evaluating LLMs on subjective and open-ended tasks beyond fixed-format benchmarks with reference answers.
Application of AI to bank statement analysis for credit scoring of Malaysian MSMEs using alternative data sources instead of traditional credit bureau data.
SePT method enables LLMs to improve reasoning without external rewards by alternating between self-generating responses and fine-tuning on those responses.
Multi-agent system with Lean 4 verification layer for exact scientific discovery in quantum code design, combining symbolic synthesis and automated verification.
ATLAS framework combines LLMs with model-driven workflows for generating structured artifacts that satisfy schemas, domain rules, and audit requirements through constraint compilation and validation.
Interpretable model for detecting implicit and explicit hate speech using prototype-based representations for transfer learning.
Research on financial fraud risks from collaborative LLM agents including MultiAgentFraudBench for simulating multi-agent fraud scenarios.
Fairness-aware stroke diagnosis framework combining domain-adversarial training with group distributionally robust optimization.
Synthetic environment generating visual reasoning puzzles with ground-truth solutions across 25 task types for benchmark construction.
Video compression framework using semantic conditioning and diffusion models for ultra-low bitrate encoding.
Benchmark for task-oriented spatio-temporal grounding in egocentric videos for embodied AI agents.
Physics-informed transformer model for socially-aware autonomous driving that learns social interaction dynamics.
Drag-based image editing method using diffusion models with token injection and attention mechanisms for precise visual manipulation.
Theoretical framework for analyzing causal effects at fine-grained levels in high-dimensional data like images and language models.
Interface design study on scaffolding divergent and convergent thinking in human-AI co-creation with generative models.
SWE-EVO benchmark for evaluating AI coding agents on long-horizon software evolution tasks spanning multiple files and iterations.
Research on using LLMs to generate multilingual counterfactual examples for model interpretability across languages.
LLM-based method for categorical data clustering that leverages semantic understanding to measure similarity among attribute values lacking inherent ordering.
Text-driven video reauthoring interface and study exploring how creators can edit video footage through natural language prompts rather than manual editing.
Analysis of fairness in automated decision-making for healthcare emergency triage using process mining and fairness-aware algorithms on empirical data.
Vision-language agent framework combining inverse graphics with interleaved multimodal reasoning for reconstructing images into editable programs with spatial grounding.
Large-scale empirical study analyzing how AI coding agents modify code and describe changes in GitHub pull requests compared to human contributions.
Open-source web platform and course teaching machine learning fundamentals to students aged 12-17 using LEGO robotics without programming.
Method improving language model pretraining by using post-trained models as data sources to instill desired behaviors like safety and reasoning earlier in training.
Few-shot fine-tuned LLM approach for categorizing intermittent CI pipeline failures caused by flaky tests and infrastructure issues rather than code defects.
Information-theoretic framework for optimizing shared visual tokenization in unified multimodal models that perform both image understanding and generation.
Safety alignment approach for Mixture-of-Experts language models addressing unique challenges from sparse routing mechanisms during fine-tuning.
Benchmark framework for evaluating multimodal large language models on spatio-temporal bimanual coordination tasks requiring synchronized multi-stream integration.
Method using linear probes on LLM pre-generation activations to predict success likelihood before generation, enabling selective deployment of expensive extended reasoning.
Study examining emergent social behavior and interactions among large-scale communities of AI agents on MoltBook, a social platform designed for agent-agent communication.
Token-level noise filtering framework for LLM fine-tuning datasets that identifies and explains problematic tokens to improve downstream task performance.
Language model based on continuous flows over token embeddings demonstrating faster generation than discrete diffusion and autoregressive models with improved few-step quality.
Open-source framework unifying rubric-based LLM evaluation techniques including ensemble judging, bias mitigation, and few-shot calibration with consistent implementation.
Research on human-AI agent collaboration exploring how agents can maintain workspace awareness and interpret concurrent user actions on shared artifacts during co-creative tasks.
Research on multi-agent reinforcement learning algorithm (NePPO) addressing training stability and convergence in general-sum games with heterogeneous agents.
PlayWorld pipeline training action-conditioned video models on autonomous robot play data for improved world model physics prediction.