Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
K-Steering enables unified multi-attribute control of LLM behavior at inference time using non-linear steering on hidden activations.
K-Steering enables unified multi-attribute control of LLM behavior at inference time using non-linear steering on hidden activations.
PhysGaia benchmark for dynamic novel view synthesis with physics-aware evaluation of multi-body interactions and realistic collisions.
LLMs applied to combinatorial optimization of Design Structure Matrices in engineering, demonstrating reasoning capabilities for complex system reorganization.
ZINA detects and edits fine-grained hallucinations in multimodal LLMs, proposing a novel evaluation task for MLLM quality.
Vision Transformer-based framework reconstructs multispectral satellite imagery obscured by clouds using SAR data for crop mapping.
PRISM: lightweight fully convolutional model for multivariate time-series classification on edge devices.
Framework treating prompts as first-class citizens in LLM pipelines to enable reuse, optimization, and runtime adaptation in complex agent systems.
CATNet applies geometric deep learning (R-GCN) to catastrophe bond spread prediction in financial markets.
Embodied-R1 introduces a 3B VLM using "pointing" as unified intermediate representation to address the seeing-to-doing gap in robotic manipulation across different embodiments.
ShadowNPU system co-design for efficient on-device LLM inference on NPUs, addressing quantization sensitivity in attention operators.
Benchmarking study of deep learning segmentation models for carotid artery structures in histopathological images with limited datasets.
DoubleAgents system for human-agent alignment in coordination tasks using a coordination agent and dashboard for preference elicitation and feedback.
Neural-MedBench reasoning-intensive benchmark for evaluating clinical reasoning ability of vision-language models beyond classification accuracy.
Vid-Freeze defense mechanism against malicious image-to-video generation using temporal freezing adversarial techniques.
MedIRT psychometric framework for evaluating LLM medical competency rather than benchmark-specific performance using Item Response Theory.
ACT system combines decision trees with LLMs to provide transparent, interpretable, and auditable AI decisions on unstructured data.
Study of how autonomy levels in LLM agents affect user privacy concerns and trust, with implications for personalization design.
FURINA-Builder multi-agent pipeline for automatically constructing customizable role-playing benchmarks at scale for evaluating LLM agent behavior.
Security analysis of LLM pruning methods showing vulnerabilities in popular inference engines like vLLM when models are pruned before deployment.
Survey of image and video restoration techniques for adverse weather conditions in intelligent transportation systems and autonomous driving.
IoT and wireless sensor networks for industrial monitoring and control using NRF transceivers and Arduino microcontrollers.
Watermarking technique for LLMs using syntactic predictability to balance text quality against detection robustness for governance and trustworthiness.
XModBench benchmark measures cross-modal consistency and modality-specific biases in omni-modal large language models across audio, vision, and text.
Game-theoretic framework for evaluating LLMs on subjective and open-ended tasks beyond fixed-format benchmarks with reference answers.
Application of AI to bank statement analysis for credit scoring of Malaysian MSMEs using alternative data sources instead of traditional credit bureau data.
SePT method enables LLMs to improve reasoning without external rewards by alternating between self-generating responses and fine-tuning on those responses.
Multi-agent system with Lean 4 verification layer for exact scientific discovery in quantum code design, combining symbolic synthesis and automated verification.
ATLAS framework combines LLMs with model-driven workflows for generating structured artifacts that satisfy schemas, domain rules, and audit requirements through constraint compilation and validation.
Interpretable model for detecting implicit and explicit hate speech using prototype-based representations for transfer learning.
Research on financial fraud risks from collaborative LLM agents including MultiAgentFraudBench for simulating multi-agent fraud scenarios.
Fairness-aware stroke diagnosis framework combining domain-adversarial training with group distributionally robust optimization.
Synthetic environment generating visual reasoning puzzles with ground-truth solutions across 25 task types for benchmark construction.
Video compression framework using semantic conditioning and diffusion models for ultra-low bitrate encoding.
Benchmark for task-oriented spatio-temporal grounding in egocentric videos for embodied AI agents.
Physics-informed transformer model for socially-aware autonomous driving that learns social interaction dynamics.
Drag-based image editing method using diffusion models with token injection and attention mechanisms for precise visual manipulation.
Theoretical framework for analyzing causal effects at fine-grained levels in high-dimensional data like images and language models.
Interface design study on scaffolding divergent and convergent thinking in human-AI co-creation with generative models.
SWE-EVO benchmark for evaluating AI coding agents on long-horizon software evolution tasks spanning multiple files and iterations.
Research on using LLMs to generate multilingual counterfactual examples for model interpretability across languages.
LLM-based method for categorical data clustering that leverages semantic understanding to measure similarity among attribute values lacking inherent ordering.
Text-driven video reauthoring interface and study exploring how creators can edit video footage through natural language prompts rather than manual editing.
Analysis of fairness in automated decision-making for healthcare emergency triage using process mining and fairness-aware algorithms on empirical data.
Vision-language agent framework combining inverse graphics with interleaved multimodal reasoning for reconstructing images into editable programs with spatial grounding.
Large-scale empirical study analyzing how AI coding agents modify code and describe changes in GitHub pull requests compared to human contributions.
Open-source web platform and course teaching machine learning fundamentals to students aged 12-17 using LEGO robotics without programming.
Method improving language model pretraining by using post-trained models as data sources to instill desired behaviors like safety and reasoning earlier in training.
Few-shot fine-tuned LLM approach for categorizing intermittent CI pipeline failures caused by flaky tests and infrastructure issues rather than code defects.
Information-theoretic framework for optimizing shared visual tokenization in unified multimodal models that perform both image understanding and generation.
Safety alignment approach for Mixture-of-Experts language models addressing unique challenges from sparse routing mechanisms during fine-tuning.