MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications
Multi-sensor foundation model merging HiRISE, CTX, and THEMIS Mars remote sensing data via equal validation loss alignment strategy.
Multi-sensor foundation model merging HiRISE, CTX, and THEMIS Mars remote sensing data via equal validation loss alignment strategy.
Multi-domain benchmark for industry code generation across finance, automation, and aerospace using LLMs, addressing single-domain limitations.
Evaluation of active preference learning versus random sampling in online DPO for modern LLMs, showing random sampling is surprisingly competitive.
Formal framework for verifiable delegation chains in multi-agent AI systems, defining properties for authorization tracking and policy enforcement.
Framework for improving data literacy in AI-assisted analysis by disrupting cognitive passivity through guided reasoning rather than direct answers.
Diffusion transformer method for inverse tone-mapping, converting 8-bit SDR video content to perceptually accurate 10-bit HDR.
Rubric-based RL framework bridging response-level and token-level rewards for LLM alignment in instruction following tasks.
Benchmark dataset for pavement distress assessment using vision-language models, requiring quantitative analysis and interactive decision support.
Task-specific LLM framework for generating SystemVerilog assertions for hardware verification, addressing data scarcity and accuracy challenges.
Quantization-aware vision token pruning for multimodal LLMs, optimizing coupled compression techniques for resource-constrained deployment.
Framework for synthesizing novel-view video sequences from single images using diffusion models with geometry-aware expansion strategy.
First comprehensive security analysis of Agent Skills, an open standard for modular LLM agent packages, covering threat taxonomy and vulnerabilities.
Conditional diffusion model for reconstructing 3D ocean states from sparse surface observations using satellite and in situ data.
End-to-end training method for localizing temporal video segments matching sentence queries, addressing task discrepancy in video backbone optimization.
Workshop on integrating LLMs with graph-structured data, covering algorithms and systems for bridging LLMs, graph databases, and ML for practical applications.
Study of weight-space model merging for multilingual machine translation, evaluating behavior when combining independently fine-tuned models.
Procedural geometry data generation and visual grounding using vision-language models for geometry education as referring image segmentation.
Legal analysis of Anthropic's AI constitution document as governance framework, discussing limitations in military and surveillance contexts.
Split-and-conquer framework for detecting partial deepfake speech using boundary detection and segment-level classification stages.
Council Mode: multi-agent consensus approach mitigating hallucinations and bias in MoE LLMs through coordinated expert activation.
Learning method using provenance-based input gradient guidance to improve model discrimination robustness with synthetic training data.
Study of annotator competence development and subjective judgment changes during social influence recognition annotation tasks.
LogicPoison attacks exploiting logical vulnerabilities in Graph-RAG systems that ground LLM reasoning in knowledge graphs.
Measuring latency and quality tradeoffs of prompt compression techniques for accelerating LLM inference in RAG systems.
Mitigating reward hacking in RLHF by analyzing and correcting flipped advantage signs in reward model parameters.
Self-optimizing multi-agent system for deep research that iteratively plans, retrieves, and synthesizes evidence across documents.
FedSQ algorithm optimizing weight averaging in federated learning across heterogeneous client data with fixed gating mechanisms.
R2-Write framework exploring deep reasoning with chain-of-thought for open-ended writing tasks using reasoning models.
Multi-modal recommendation system using generative learning to align visual and textual item content with user preferences.
Comparison of pedagogy-informed custom vs general-purpose AI chatbots for supporting students' science problem-solving using network analysis.
SWE-STEPS dataset and framework for evaluating coding agents on sequential, long-horizon software development tasks with accumulated technical debt.
JoyAI-LLM Flash, an efficient mixture-of-experts mid-scale LLM with 20 trillion token pretraining optimized for token efficiency.
Multimodal emotion and cognitive understanding dataset for older adults addressing gap in emotion prediction research for aging populations.
Open-source methodology enabling natural language queries on structured data by training LLMs to generate executable queries with synthetic training data.
Framework for eliciting and verbalizing LLM assumptions to explain and mitigate sycophancy behavior in user interactions.
Large-scale empirical study of credential leakage vulnerabilities in 17,022 LLM agent skills, identifying 520 vulnerable skills with taxonomy of 10 leakage patterns.
Security study of supply-chain poisoning attacks against LLM coding agents through malicious third-party skills with system-level execution.
Vision transformer baseline for synthetic aperture radar sea ice classification addressing class imbalance.
Self-Guide method for co-evolving policy and internal reward in LLM agents, addressing sparse reward bottleneck in long-horizon training.
Knowledge graph completion approach for network alert prediction modeling cyber-attacks as hyper-relational statements.
Benchmarking training-free unlearning methods for removing sensitive visual concepts from vision-language models.
Safety evaluation of Kimi K2.5 open-weight LLM assessing CBRNE misuse, cybersecurity, alignment, and bias risks.
Domain-adapted RAG pipeline using fine-tuned embedding models for pedagogical dialogue act annotation without generative model fine-tuning.
Systematic security evaluation of six OpenClaw-series AI agent frameworks identifying vulnerabilities in tool-augmented LLM agents.
Case study of AI-assisted unit test writing and test-driven refactoring for improving legacy codebase maintainability.
InCoder-32B-Thinking model trained with Error-driven Chain-of-Thought for industrial code generation with reasoning traces.
Method for identifying valence-arousal emotion subspace in LLM representations using steering vectors and PCA.
Survey of contextual enrichment strategies for LLMs from in-context prompting through retrieval-augmented generation and GraphRAG.
Analysis of hallucination effects in reinforcement learning post-training for multimodal LLMs, examining whether RL improves visual reasoning or merely exploits hallucinations.
Research on optimization primitives in context space for AI agents, addressing credit assignment, overfitting, and learning signal challenges.