MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
MedXIAOHE, a medical multimodal foundation model with entity-aware continual pretraining, achieves state-of-the-art on clinical benchmarks.
MedXIAOHE, a medical multimodal foundation model with entity-aware continual pretraining, achieves state-of-the-art on clinical benchmarks.
Method to detect backdoor attacks in LoRA adapters without test inputs by analyzing weight space, addressing security vulnerabilities in shared model repositories.
Study on human-agent co-creative collaboration patterns in shared workspaces, revealing capability gaps for concurrent interaction vs sequential delegation.
Agora platform uses LLMs with AI personas to teach civic competence and consensus-finding skills through deliberative democratic practice.
MM-tau-p²: Persona-adaptive evaluation framework for multi-modal LLM agents with dual-control settings exposing user personality and behavior adaptation.
HyCon: Hyperbolic control mechanism for steering text-to-image models away from unsafe concepts using parallel transport instead of Euclidean adjustments.
Frequency-based data curation method for selecting calibration data to preserve LLM performance during post-training pruning and quantization.
Examines prompt framing effects on LLM decision-making in threshold voting tasks across model families under isolated, non-interactive settings.
HR Simulator: Game-based evaluation of LLMs navigating complex workplace social norms like giving feedback and rejecting requests appropriately.
Identifies first-mover bias in SHAP explanations from gradient boosting's sequential fitting causing attribution instability under multicollinearity.
Step-level faithfulness evaluation shows chain-of-thought reasoning in frontier LLMs is often decorative, post-hoc narrative rather than genuine reasoning.
Code Review Agent Benchmark: Dataset and evaluation framework for assessing AI agents' ability to review code quality in generated codebases.
DiffAttn: Diffusion-based framework for predicting drivers' visual attention using LLM-enhanced semantic reasoning for intelligent vehicles.
SABLE: Semantics-aware backdoor attack on federated learning using realistic, in-distribution visual triggers instead of synthetic patterns.
MemFactory: Unified framework for training and inference of memory-augmented LLM agents with reinforcement learning optimization of memory operations.
Compares GraphRAG with VectorRAG for retrieval-augmented generation, showing simpler vector-based approaches handle chunk relationships effectively.
DVGT-2: Vision-Geometry-Action model for autonomous driving using dense 3D geometry instead of language descriptions for planning.
Analyzes safety, security, and cognitive risks in world models used for autonomous decision-making in robotics and agentic AI systems.
Uses sparse autoencoders to automatically annotate morphological traits in biological organism images, automating expert-driven extraction process.
Demonstrates environment-injected memory poisoning attacks on LLM-based web agents through contamination persisting across sessions without direct memory access.
GraphicDesignBench: First comprehensive benchmark for evaluating AI on professional graphic design tasks including layout translation and typographic rendering.
Identifies sparse routing mechanism in alignment-trained language models where gate attention heads trigger refusal responses, validated across 9 models from 6 labs.
Vero: Open-source family of vision-language models matching proprietary systems on visual reasoning tasks using reinforcement learning with public recipes and data.
Developer tool: ContextSync syncs VS Code AI chat history via Obsidian/OneDrive to maintain context across team LLM sessions.
macOS tool: on-device transcription with ChatGPT summaries for meetings and audio. No cloud storage, Apple Intelligence integration.
arXiv research benchmark for evaluating AI performance on graphic design tasks, measuring model capabilities in visual design domains.
VitalNexa is an AI health agent that analyzes lab results and wearable data to provide personalized health recommendations and biological age scoring.
Author presents a protocol addressing AI's structural tendency to agree and sound authoritative rather than hallucinating, causing subtle reality distortions in outputs.
Technical analysis of practical constraints preventing AI agents from autonomous operation, mapping barriers and their severity.
Practical analysis of operational and technical barriers preventing autonomous AI agents, mapping constraints in agent economy.
Pydantic-resolve is a declarative data assembly library using DataLoader pattern to eliminate N+1 queries across REST, GraphQL, and MCP protocols.
2019 article about OpenAI's decision not to release GPT-2 over safety concerns (title only, no content).
Case study of server overload caused by LLM scraper bots making excessive HTTPS requests to acme.com domain.
Video of Sam Altman discussing AI development (title only, no content provided).
Research finding that larger and more instructable language models show decreased reliability (title only).
Static analysis tool detecting ReDoS vulnerabilities in Python regular expressions with automatic fixes.
Marketing content for oral dissolving peptides supplement product.
Omni Voice is a multilingual AI voice cloning and text-to-speech platform supporting 646 languages with unified model.
Drive9 is agent-native data infrastructure providing filesystem-like interface with semantic search, embedding, and full-text indexing for AI agents.
Analysis of impact when non-technical teams adopt AI coding agents at 100% adoption rate (title only).
Overview of how AI is transforming legal work by automating research, document review, and drafting tasks for lawyers and paralegals.
Kylrix is an open-source privacy suite offering note, vault, task, and communication alternatives to Google services.
Release Please automates changelog generation, GitHub releases, and version bumping via conventional commit parsing.
Analysis of MCP connection model security: agent frameworks keep all integrations live during sessions, creating unnecessary attack surfaces and costs.
GitHub Copilot CLI now supports bring-your-own-key models and local models via Azure OpenAI, Anthropic, or OpenAI-compatible endpoints.
Google's JSIR: open-source high-level intermediate representation for JavaScript code analysis and transforms.
Open-source spec-driven integration framework for API sprawl, enabling governed AI agent integration with SaaS/microservices.
HN discussion on tools for enforcing LLM/agent call limits at runtime rather than just monitoring, addressing cost control in agent systems.
SQLite-backed cost analytics tool for Claude Code sessions, tracking token spend, cache hits, and budget enforcement.
Thread initiation post about BSD operating systems and AI with minimal content and formatting guidelines.