ICE-Guard framework detects spurious feature reliance in LLMs for high-stakes decisions through intervention consistency testing on demographic, authority, and framing biases.
Method for scaling vision-language-action robot learning using generative 3D worlds to address sim-to-real gap.
SCISSR: Scribble-based interactive framework for surgical scene segmentation using SAM-style prompting.
CoDA explores adversarial attacks on medical vision-language models and proposes token-space repair methods.
HiMu hierarchical frame selection method for long video question answering with vision-language models.
Study showing Transformers learn robust in-context regression under distributional uncertainty without restrictive assumptions.
SpecForge: Open-source production framework for training draft models used in speculative decoding to reduce LLM inference latency.
ICE framework evaluates LLM explanation faithfulness using statistical intervention testing with randomization baselines.
Systematic analysis and improvements to Elastic Weight Consolidation for continual learning to better estimate weight importance.
Benchmark comparing PETNN, KAN, and classical deep learning models on myMNIST Burmese handwritten digit recognition dataset.
AutORAN uses LLMs for natural language programming to simplify xApp development in Open Radio Access Networks.
LSE framework trains LLMs to self-improve during inference by iteratively refining context based on problem feedback.
OpenT2M: Million-scale open-source dataset with 2800+ hours of motion data for text-to-motion generation in animation and robotics.
REST algorithm for zero-shot object-goal navigation using receding horizon planning and Steiner trees for generating subgoal candidates in unknown environments.
Anderson-Darling leakage assessment method for detecting side-channel leakage in neural networks, improving on TVLA's mean-based approach.
Benchmarking framework for PDF table extraction using LLM-based semantic evaluation on synthetically generated PDFs with LaTeX ground truth.
SSL framework for medical ultrasound image segmentation using contrastive learning with multiscale switching to handle limited labeled data and imaging artifacts.
Mathematical framework distinguishing cognitive amplification from cognitive delegation in human-AI systems for measuring AI impact on human reasoning.
HISR framework improving multi-turn agentic reinforcement learning through hindsight information modulation and segmental process rewards for complex long-horizon tasks.
Neuro-symbolic sim2real image translation framework using structured ontology-guided diffusion for zero-shot domain transfer without labeled real data.
CausalRM method for learning reward models from observational user feedback (clicks, upvotes) as scalable alternative to controlled RLHF annotation.
Study measuring confirmation bias in LLM-based security code review systems and its exploitability in software supply-chain attacks.
Weakly supervised method for generating natural language explanations in chest X-ray classification without explicit explanation annotations.
Ablation study of Group Relative Policy Optimization components for LLM reasoning training, questioning necessity of complex loss functions.
ClawTrap MITM-based red-teaming framework for evaluating security robustness of autonomous web agents like OpenClaw against network-layer threats.
AutoPipe framework for automated configuration of LLM post-training pipelines combining supervised fine-tuning and reinforcement learning under budget constraints.
Diffusion-based 3D generation framework leveraging point cloud priors as geometric constraints for improved structure-aware object synthesis.
32B parameter Korean-language LLM optimized for enterprise reasoning, long-context understanding, and agentic workflows with domain-specific capabilities.
Watermarking method for LLM ownership protection using functional subspaces, robust against fine-tuning, quantization, and knowledge distillation.
Vision-language model enhanced with explicit spatial token generation for improved 2D/3D spatial reasoning and fine-grained grounding.
Survey of 230 computer science students on ethical implications and societal impacts of AI from a gender perspective.
Formal specification for cryptographic admission control governing autonomous agent actions in institutional B2B environments, validating identity and policy compliance.
Video reasoning model using trajectory and motion information for improved spatio-temporal inference in video understanding tasks.
Study on how AI-mediated video communication affects trust and credibility detection. Social impact of AI, limited technical content.
Case study evaluating LLM-generated lessons in Duolingo for language learning. LLM application assessment with limited technical depth.
MultihopSpatial benchmark for multi-hop spatial reasoning in Vision-Language agents. Evaluation dataset for VLA agents.
Framework proposing readiness metrics for human-AI decision-making teams beyond accuracy. Evaluation methodology for AI collaboration.
Conditional diffusion models translating MRI to PET for medical imaging. ML for healthcare, not AI/agent related.
PASTE: Pattern-Aware Speculative Tool Execution to reduce latency in LLM agent tool loops. Optimization for agentic workflows.
XKD-Dial: four-stage training pipeline for citation-grounded dialogue reducing hallucination in English-Hindi LLMs. LLM application addressing hallucination.
arXiv paper examining regulatory frameworks for agentic AI security and privacy. Policy analysis of AI agent governance.
Simulation-based inference for moment tensor inversions in seismology. ML method applied to geophysics, not AI/agent focused.
PRIOR framework for humanoid locomotion with natural gaits using Isaac Lab. ML for robotics, not core AI agent/LLM focus.
Benchmark evaluating AI agent performance on domain-specific data science tasks against human expert baselines across multiple domains.
RAG method using hypothesis-conditioned query rewriting to retrieve decision-relevant evidence for choice tasks beyond topical relevance.
Framework enabling LLM agents to recognize secure trusted execution environments for secure IP disclosure negotiations.
Multilingual temporal reasoning benchmark with 15K examples across 5 languages testing LLM capabilities on date arithmetic and temporal relations.
Post-hoc debiasing method for vision-language models like CLIP using sparse embedding modulation to separate bias from semantic information.
Streaming video understanding framework that decouples semantic understanding from perception for proactive query handling.
Study comparing LLM-generated analogies to human-produced ones using geometric parallelogram model of analogical relations.