Isolater - Feed

HN AkshatVirmani 3/30/2026

APIEval-20: A Benchmark for Black-Box API Test Suite Generation

APIEval-20 benchmark dataset for evaluating black-box API test suite generation using LLMs and schemas.

HN u1hcw9nx 3/30/2026

Subnautica 2 publisher's CEO used ChatGPT to fire studio head and failed

CEO used ChatGPT to terminate studio head; decision was reversed and criticized.

HN btursunbayev 3/30/2026

Show HN: NVSonar – tells why your GPU is slow, not just utilization percentage

GPU profiling tool that diagnoses performance bottlenecks beyond utilization metrics. Minimal details provided but relevant tool.

HN thiagolalvarez 3/30/2026

I built an MCP server so your agent stops picking the wrong cloud services

MCP server for AI agents to select appropriate cloud services with current pricing and compatibility data. 74 services, no API key required.

HN LionTurtle13 3/30/2026

Stanford study reveals AI vision models invent images they never see

Stanford research showing AI vision models generate images not in training data through hallucination mechanisms.

HN KnuthIsGod 3/30/2026

President Trump Gaggles with Press on Air Force One En Route

News about President Trump press interaction on Air Force One.

HN walterbell 3/30/2026

Tribe v2: Predictive Foundation Model on Human Brain Processing Complex Stimuli

TRIBE v2: Predictive AI model of human brain responses to visual, auditory, and language stimuli from neuroscience research.

HN userbinator 3/30/2026

HD Audio Driver for Windows 98SE / Me

HD Audio driver for Windows 98SE/ME systems on Intel chipsets with WDM support.

HN bthallplz 3/30/2026

Excel2r – R package that migrates Excel workbooks to standalone R scripts

R package that converts Excel workbooks to standalone R scripts with formula recreation and verification against cached values.

HN keiranflynn 3/30/2026

LLMnesia – Local-first search across your AI conversations

LLMnesia: Local-first search tool for AI conversation history across ChatGPT, Claude, Gemini, and other platforms.

HN 1vuio0pswjnm7 3/30/2026

Meta's court losses spell potential trouble for AI research, consumer safety

Analysis of Meta's legal losses and liability implications from internal social science research on platform effects.

HN newtechwiz 3/30/2026

Show HN: Free, Open-Source WhisperFlow That Just Works

WhisperFlow: Free, open-source speech-to-text tool for macOS. On-device processing, no cloud upload, no account required.

Ax Hao Li, Qiao Sun 3/30/2026

Any4D: Open-Prompt 4D Generation from Natural Language and Images

arXiv research on 4D generation from natural language and images using embodied world models. Addresses data scarcity and long-horizon video generation challenges.

Ax Zhenchao Tang, Fang Wang, Haohuai He, Jiale Zhou, Tianxu Lv, Jun Zhu, Shouzhi Chen, Minghao Yang, Yu Wang, Jiayang Wu, Yidong Song, Yaokun Li, Jiehui Huang, Dawei Huang, Zhi Song, Jianhua Yao 3/30/2026

Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

arXiv research proposing Balanced Fine-Tuning method for aligning LLMs with biomedical knowledge. Combines SFT and RL using confidence-weighted token optimization for scientific understanding.

Ax Daeun Lee, Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Mohit Bansal 3/30/2026

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

arXiv research on streaming video understanding with gaze signal interpretation for AR applications. Evaluates multimodal LLMs on temporal reasoning with human attention signals.

Ax Woongyeong Yeo, Kangsan Kim, Jaehong Yoon, Sung Ju Hwang 3/30/2026

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

arXiv research on multimodal memory architecture for long-form video understanding. Addresses context capacity and visual detail retention in hours-long videos using dynamic memory mechanisms.

Ax David Samuel, Lilja {\O}vrelid, Erik Velldal, Andrey Kutuzov 3/30/2026

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

Post-training method for lower-resource languages preserving fluency when aligned by disfluent reward models, addressing preference optimization data scarcity.

Ax Ruining Li, Yuxin Yao, Chuanxia Zheng, Christian Rupprecht, Joan Lasenby, Shangzhe Wu, Andrea Vedaldi 3/30/2026

Particulate: Feed-Forward 3D Object Articulation

Feed-forward transformer model predicting 3D object articulations including parts, kinematic structure, and motion constraints for articulated object understanding.

Ax Boxin Wang, Chankyu Lee, Nayeon Lee, Sheng-Chieh Lin, Wenliang Dai, Yang Chen, Yangyi Chen, Zhuolin Yang, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping 3/30/2026

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Cascaded reinforcement learning infrastructure for scaling general-purpose reasoning models, addressing heterogeneity in response lengths and verification latency.

Ax Wentao Guo, Mayank Mishra, Xinle Cheng, Ion Stoica, Tri Dao 3/30/2026

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

SonicMoE optimizes Mixture of Experts model inference through IO and tile-aware techniques, accelerating high-sparsity MoE architectures for language models.

Ax Zhijie Zhong, Zhiwen Yu, Pengyu Li, Jianming Lv, C. L. Philip Chen, Min Chen 3/30/2026

PathFinder: Advancing Path Loss Prediction for Single-to-Multi-Transmitter Scenario

Deep learning method for radio path loss prediction in multi-transmitter 5G scenarios, addressing distribution shifts and environmental generalization.

Ax David Samuel, Lucas Georges Gabriel Charpentier 3/30/2026

Dual-objective Language Models: Training Efficiency Without Overfitting

Dual-objective language model combining autoregressive and masked-diffusion training without architectural changes, improving efficiency and reducing overfitting.

Ax Pengyu Wang, Shuchang Ye, Usman Naseem, Jinman Kim 3/30/2026

MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation

Medical report generation using reinforcement learning with clinical alignment objectives, improving correctness over token-level likelihood training approaches.

Ax Sara Papi, Javier Garcia Gilabert, Zachary Hopton, Vil\'em Zouhar, Carlos Escolano, Gerard I. G\'allego, Jorge Iranzo-S\'anchez, Ahrii Kim, Dominik Mach\'a\v{c}ek, Patricia Schmidtova, Maike Z\"ufle 3/30/2026

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

Study comparing SpeechLLMs that directly process speech for translation against cascaded transcription pipelines, evaluating speech modality integration effectiveness.

Ax Matthew Thompson 3/30/2026

The Dual-State Architecture for Reliable LLM Agents

Dual-State Architecture formalizes execution primitives coupling stochastic LLM generation with deterministic verification guards for reliable code generation agents.

Ax Subeen Lee, Siyeong Lee, Namil Kim, Jaesik Choi 3/30/2026

RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Benchmark evaluating LiDAR 3D perception model robustness under simultaneous domain shifts and label-space evolution in autonomous driving scenarios.

Ax Laura Dietz, Bryan Li, Gabrielle Liu, Jia-Huei Ju, Eugene Yang, Dawn Lawrie, William Walden, James Mayfield 3/30/2026

Incorporating Q&A Nuggets into Retrieval-Augmented Generation

Crucible system augments RAG with Q&A nuggets from documents, preserving citation provenance and improving extraction, selection, and report generation.

Ax Laura Dietz, Bryan Li, Eugene Yang, Dawn Lawrie, William Walden, James Mayfield 3/30/2026

Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Study examining risks of RAG system evaluation and optimization using LLM judges, revealing circularity issues in nugget-based evaluation approaches.

Ax Donghee Lee, Rui Cai, Zhe Zhao 3/30/2026

CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

CARPE method improving vision-centric capabilities of vision-language models through context-aware image representation prioritization via ensemble approach.

Ax Kei Saito 3/30/2026

NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference

Framework addressing LLM's tendency to collapse ambiguous inputs prematurely by mapping text to non-collapsing state spaces for better dialogue reasoning.

Ax Bhada Yun, Renn Su, April Yi Wang 3/30/2026

AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Study introducing VAPT toolkit to evaluate how LLMs extract, embody, and explain human values from conversations through user perception research.

Ax Weiyu Sun, Liangliang Chen, Yongnuo Cai, Huiru Xie, Yi Zeng, Ying Zhang 3/30/2026

EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions

Benchmark for evaluating multimodal LLMs on handwritten STEM student solutions with mathematical formulas and diagrams, addressing authentic domain-specific evaluation gaps.

Ax Nisharg Nargund, Priyesh Shukla 3/30/2026