TREASURE: The Visa Payment Foundation Model for High-Volume Transaction Understanding
TREASURE foundation model for transaction understanding in payment networks, enabling anomaly detection and consumer insights at scale.
TREASURE foundation model for transaction understanding in payment networks, enabling anomaly detection and consumer insights at scale.
Socratic questioning framework improving VLM understanding of remote sensing images by addressing pseudo-reasoning and incomplete perception issues.
REVEAL framework for detecting AI-generated images with forensic explainability through structured reasoning rather than post-hoc rationalizations.
Analysis of chain-of-thought reasoning in LLMs from optimization lens, addressing overthinking and performance issues in long-CoT prompting.
Adaptive Replay Buffer for offline-to-online reinforcement learning that dynamically balances fixed offline data with new online experiences.
PyFi framework for financial image understanding using vision-language models with adversarial agents and 600K QA dataset organized in reasoning pyramid.
First empirical study of machine unlearning in hybrid quantum-classical neural networks, adapting classical unlearning methods to variational quantum circuits.
Benchmark for evaluating physics-grounded audio in text-to-audio-video generation models.
LLM-assisted framework for identifying security assets in SoC designs to improve pre-silicon security verification.
LLM-based framework for document inconsistency detection with improved evidence extraction capabilities and metrics.
RL-enhanced MLLM approach for high-resolution image quality assessment using context-aware multi-scale visual probing.
Multi-targeted backdoor attack method for graph neural networks using injection-based trigger mechanisms.
Framework virtualizing computer environments as interactive tools to elicit general agentic intelligence capabilities in LLMs.
Space filling curves applied to communication-avoiding matrix multiplication for efficient HPC and deep learning workloads.
Audio LLM-based approach for detecting speech editing and localizing manipulated content without frame-level supervision.
Self-validation framework mitigating object hallucination in Large Vision-Language Models via structured analysis.
Submodular-based data selection for efficient LLM instruction tuning by addressing gradient conflicts in Fisher information.
Open-source library for learning representations and world models using Joint-Embedding Predictive Architectures.
Framework using influence functions to craft training data perturbations that induce targeted behavior changes in models.
Longitudinal study revealing daily and weekly performance variations in LLMs, impacting research reproducibility and reliability.
Quadratic programming solver for robotics and AI using shifted primal-dual methods with strong warm-start capabilities.
System for recommending quotations that are semantically unexpected yet rational in writing contexts.
Open-source foundation model for 3D chemical systems combining generative and predictive capabilities for molecules and materials.
Training-free adapter-free operator for lifting 2D foundation models to 3D volumetric data without retraining.
Proposes truncated backpropagation method to reduce memory costs in video diffusion model training with pixel-wise losses.
Reformulates Amdahl's Law for modern heterogeneous AI systems with constrained resource allocation across diverse hardware.
Proposes design principles for integrating resilience and human oversight into LLM-assisted digital twin modeling workflows.
MR-CDM: Multi-resolution time series forecasting framework using hierarchical decomposition and diffusion-based generation.
JoyAI-LLM Flash: Efficient Mixture-of-Experts language model in sub-50B parameter range, pretrained on 20 trillion tokens with optimized post-training.
VisionClaw: Always-on wearable AI agent on Meta Ray-Ban glasses, integrating egocentric perception with speech-driven OpenClaw task execution.
Continuous Softened Retracing reSampling method for stabilizing unsupervised self-evolution of multimodal LLMs during post-training.
k-Maximum Inner Product Attention mechanism for graph transformers, addressing quadratic complexity and analyzing expressive power of GraphGPS.
TILA: Vision-language pretraining method for analyzing temporal changes in chest X-rays rather than individual images.
Deep learning approach for in-hospital mortality prediction from incomplete multimodal EHRs using point cloud paradigm.
Empirical robustness analysis of TabPFN tabular foundation model's in-context learning under noisy conditions.
Discusses environmental and computational costs of scaling LLM agents beyond human cognitive capacity, framing AI acceleration as paradigm shift.
CalM: Self-supervised foundation model for calcium-imaging neural data, adaptable to multiple neuroscience analysis tasks.
Region-R1: Framework for multi-modal retrieval-augmented generation re-ranking using query-side region cropping to improve image-question relevance.
Formal verification study of 3,500 code artifacts from 7 LLMs across 500 security-critical prompts, quantifying exploitable vulnerabilities in AI-generated code.
OpenCEM: Open-source digital twin simulator and dataset integrating natural language with renewable energy microgrid dynamics for intelligent energy management.
Analyzes robustness of diffusion-based image compression to bit-flip errors, comparing against classical and learned codecs.
Qualitative case study examining Nigerian legal professionals' perceptions of AI governance, regulatory gaps, and institutional readiness.
Introduces AgriPriceBD, a benchmark dataset of 1,779 daily commodity prices from Bangladesh, comparing classical and deep learning forecasting models.
Probabilistic language tries (PLTs) unify prefix structure representation serving as lossless compressor, decision policy, and execution reuse framework.
FLeX: Fourier-based low-rank expansion for parameter-efficient cross-lingual code generation transfer from Python to Java using Code Llama 7B.
Analysis of grokking training dynamics showing spectral edge reveals functional modes invisible to mechanistic interpretability tools.
S³: stratified scaling search for test-time inference in diffusion language models using classical verifiers to improve generation without additional training.
Quantum-inspired tensor network anomaly detection (SMT-AD) using superposition of bond-dimension-1 matrix product operators with Fourier feature embeddings.
Multimodal VAE framework for survival risk modeling in multiple myeloma integrating heterogeneous omics and clinical data with improved latent regularization.
RAGEN-2 identifies reasoning collapse in RL-trained multi-turn LLM agents where models use input-agnostic templates despite stable entropy metrics.