Ax Dang Nguyen, Harvey Yiyun Fu, Peter West, Ari Holtzman, Chenhao Tan 4d ago

Moral Mazes in the Era of LLMs

HR Simulator: Game-based evaluation of LLMs navigating complex workplace social norms like giving feedback and rejecting requests appropriately.

Ax Yuntong Zhang, Zhiyuan Pan, Imam Nur Bani Yusuf, Haifeng Ruan, Ridwan Shariffdeen, Abhik Roychoudhury 4d ago

Code Review Agent Benchmark

Code Review Agent Benchmark: Dataset and evaluation framework for assessing AI agents' ability to review code quality in generated codebases.

Ax Ryszard Tuora, Mateusz Gali\'nski, Micha{\l} Godziszewski, Micha{\l} Karpowicz, Mateusz Czy\.znikiewicz, Adam Kozakiewicz, Tomasz Zi\k{e}tkiewicz 4d ago

UnWeaving the knots of GraphRAG -- turns out VectorRAG is almost enough

Compares GraphRAG with VectorRAG for retrieval-augmented generation, showing simpler vector-based approaches handle chunk relationships effectively.

Ax Gabriel Sarch, Linrong Cai, Qunzhong Wang, Haoyang Wu, Danqi Chen, Zhuang Liu 4d ago

Vero: An Open RL Recipe for General Visual Reasoning

Vero: Open-source family of vision-language models matching proprietary systems on visual reasoning tasks using reinforcement learning with public recipes and data.

HN b-man 4d ago

Release Please

Marketing content for oral dissolving peptides supplement product.

HN videobroker 4d ago

Law Students: AI Is Changing Things

Overview of how AI is transforming legal work by automating research, document review, and drafting tasks for lawyers and paralegals.

HN jaypatelani 4d ago

The BSDs in the AI Age

Thread initiation post about BSD operating systems and AI with minimal content and formatting guidelines.