PRISM: Prompt-Refined In-Context System Modelling for Financial Retrieval
PRISM: Training-free framework combining prompt engineering and multi-agent coordination for financial document retrieval with LLMs.
PRISM: Training-free framework combining prompt engineering and multi-agent coordination for financial document retrieval with LLMs.
Agent-based framework for automatic validation of mathematical optimization models generated by LLMs from natural language descriptions.
Research on iterative concept refinement for vision classifiers through human-in-the-loop deliberation for subjective visual tasks.
Finch: benchmark for evaluating agents on enterprise finance workflows including data entry, retrieval, calculation, and reporting using Enron dataset.
DDFT protocol for measuring epistemic robustness in LMs under degraded information and adversarial stress beyond static benchmarks.
HAG framework for topic-adaptive agent generation in agent-based modeling balancing macro-level distributions with micro-level rationality.
Mechanistic interpretability study of how Diffusion Transformers generate correct spatial relations between objects in text-to-image generation.
ConvoLearn dataset of 2,134 tutor-student dialogues for fine-tuning LLMs on dialogic tutoring principles in science education.
Study showing LLMs exhibit robustness to emotional framing in rule-bound decision-making despite known brittleness to prompt perturbations.
TSPO: RL framework for multi-turn search-augmented LLM reasoning addressing process and reward homogenization in tool-integrated tasks.
Method for improving Vision Language Model robustness when modalities are missing using scalable diffusion-based feature restoration.
Multi-agent LLM framework for discovering instrumental variables in causal inference through interdisciplinary knowledge synthesis.
Voxtral Realtime: natively streaming ASR model achieving sub-second latency with end-to-end training for audio-text alignment.
SSLogic: agentic meta-synthesis framework where LLM agents iteratively create and refine generator-validator pairs for logic reasoning tasks.
KLong: open-source LLM agent trained for extremely long-horizon tasks using trajectory-splitting SFT and progressive RL with Research-Factory pipeline.
AI Runtime Infrastructure layer that observes and optimizes agent execution for task success, latency, token efficiency, and safety.
DeepFact benchmark and co-evolving agent system for testing factuality of search-augmented LLM-generated research reports.
HECG framework for autonomous agents using LLMs with multi-dimensional error correction and strategy transfer across tasks.
Study showing that deliberation between multiple LLMs can amplify tiny perturbations into divergent decisions, challenging robustness assumptions.
Machine learning framework for automating defect detection in photovoltaic systems using electroluminescence imaging.
Proposes alternative training architecture for geometric and neuromorphic AI using non-standard arithmetic to reduce memory overhead.
Conceptual framework for AI governance addressing regulatory gaps between task-specific systems and foundation models.
Voxtral TTS expressive multilingual text-to-speech model generating natural speech from minimal reference audio.
Metriplector neural architecture primitive based on field theory where input configures abstract physical systems.
ClawSafety exposes security vulnerabilities in local LLM agent frameworks where prompt injection enables privilege escalation.
AgentSocialBench evaluates privacy risks in collaborative multi-agent social networks with persistent LLM agents.
Modal framework for knowledge representation handling domain-specific concept meaning shifts in knowledge graphs.
XpertBench evaluates LLM performance on expert-level open-ended tasks with rubrics-based assessment.
Addresses value hallucination in Dyna reinforcement learning agents through multistep predecessor models.
VLBiasBench evaluates biases in large vision-language models across diverse domains and question formats.
Study of app metamorphosis phenomenon where mobile apps undergo significant market repositioning.
MegaFake dataset of LLM-generated fake news for understanding mechanisms behind AI-generated misinformation.
SPRIG optimizes system prompts for LLMs using genetic algorithms to improve general task performance.
Comprehensive survey of document parsing techniques for extracting structured information from unstructured documents.
Certified Training with Branch-and-Bound for learning verifiably stable neural control systems.
RIRS framework for multi-agent RAG systems to route complex questions across distributed knowledge bases.
Human-AI collaboration for game testing using vision language models to enhance manual testing efficiency.
Framework for statistical inference on detected changepoints in sequential analysis with confidence sets.
Review of anomaly detection techniques for cyber-physical systems security in critical infrastructure.
Reasoning Model Implicit Association Test studies implicit bias-like patterns in LLMs that use step-by-step reasoning.
BalancedDPO method aligns diffusion models with multiple conflicting evaluation metrics for text-to-image generation.
Open-source benchmark for 3D chip design using OpenROAD framework, evaluates power, performance, area, and thermal metrics.
Investigates alignment of causal attribution scores (Shapley, Banzhaf, Causal Responsibility) for database tuple relevance in data management.
RaPA improves transferable targeted adversarial attacks by identifying and pruning redundant surrogate model parameters.
Online test-time adaptation method for spiking neural networks via threshold modulation, enabling edge deployment with distribution shift handling.
FSD bridges reasoning and decision-making in robotic manipulation by combining Vision-Language Models with action prediction for zero-shot generalization.
Bayesian ablation framework for interpreting latent task representations in neural networks, enabling probabilistic analysis of learned representations.
VERDI uses Vision-Language Models embedded in autonomous driving stack for reasoning-based trajectory planning under partial observability.
Chapter reviewing ML/AI applications in food processing, covering classification frameworks and data science approaches to food informatics.
SoSBench evaluates LLM safety alignment across six scientific domains with sophisticated, knowledge-intensive adversarial prompts.