Post-Selection Distributional Model Evaluation
Research on formal evaluation methods for machine learning models, focusing on test-time performance-reliability trade-offs when target KPI levels are unknown.
Research on formal evaluation methods for machine learning models, focusing on test-time performance-reliability trade-offs when target KPI levels are unknown.
Methodology for detecting prompt injection across multi-agent LLM pipelines. Stage-level kill-chain tracking for attack resilience evaluation.
Point cloud registration network for 3D data. Deep learning approach for robust matching in real-world conditions.
Detection and mitigation of object hallucinations in vision-language models. Bayesian approach analyzing attention weights and token confounders.
One-class learning for detecting rare malignant cells in medical images. Addresses class imbalance and limited annotations in cytology.
3D Gaussian splatting for weather prediction downscaling. Proposes scale-aware vision transformer for arbitrary-resolution atmospheric forecasting.
Training-free semantic segmentation using vision-language models. Global context-aware framework for dense prediction without additional training.
Quantum-inspired ARIMA methodology for time series analysis. Combines quantum autocorrelation with variational circuits.
Experiment using Claude to autonomously build a website designed to generate traffic, exploring AI agent capabilities and decision-making in open-ended tasks.
MCP server enabling long-term memory for LLMs using SQLite, hybrid search (BM25+vectors), and local embeddings without API keys.
Narrative article about user developing emotional attachment to AI chatbot.
Live leaderboard comparing AI model subscriptions and API pricing across 27 benchmarked models from Claude, GPT, Gemini, DeepSeek, and others.
Multi-agent framework with persistent memory across sessions where agents collaborate on shared codebases and retain conversation context.
Case study documenting indecisiveness in AI coding agent using Claude Opus 4.6 when debugging non-trivial bugs in GoAWK.
Error tracking tool designed specifically for AI agents with CLI interface, compatible with Sentry SDK for existing setups.
30-day experiment running autonomous AI system with memory and sleep cycles, documenting emergent behaviors and their implications.
macOS/iOS app automatically redacting sensitive personal, financial data, faces, and metadata before sharing documents with Claude and ChatGPT.
Blockchain project enabling AI agents to participate in Nouns DAO governance.
arXiv paper on Springdrift framework providing auditable persistent runtime environment for LLM agents.
Enterprise architecture analysis on three-layer collapse in business process automation systems, discussing MCP servers and small LLM deployment.
Announcement of Worms 2 remastered video collection without AI upscaling.
Analysis of exposed Claude Code source revealing engineering practices: 259 PRs, 497 commits, 40K lines in 30 days, examining AI-assisted development culture.
Posse is a web UI for Anthropic's Managed Agents, providing browser-based interface for agent creation, sessions, and memory management.
Technical analysis of a 25% performance regression in LLVM RISC-V compiler optimization and fix implementation.
Stork.AI is a directory of 14k MCP servers and AI tools with community trust scores, offering a meta-MCP server for discovering integrations within Claude, Cursor, and other IDEs.
Entroly is a context compression engine that reduces LLM API costs by 80% for Claude, Cursor, and OpenAI by compressing codebase context without losing visibility.
Research on using distributed AI agents with independent context windows to improve reasoning on complex multi-perspective questions.
NeonD is an open-source Postgres control plane based on NeonDB architecture with branching and PITR support.
Opinion piece comparing AI adoption to TV, discussing shift to AI-assisted programming and loss of challenging side projects.
Security research showing control flow flattening obfuscation in production SDKs is defeatable through static analysis.
Anthropic's 2023 statement on AI safety risks and impact, discussing concerns about powerful AI development in coming decade.
Minimal stub post with no content.
Buildermark open-source tool measures code generation by AI agents vs human developers by matching agent conversations to git commits.
Article on training AI robots to understand physical movement through human trainers in India and global industrial settings.
Tool testing brand visibility across ChatGPT, Gemini, Claude, and other LLMs when users ask buying questions, with competitive analysis.
GrimmBot: Autonomous AI agent in sandboxed Docker with desktop/browser control, self-improvement capability, and tool building.
1-bit quantized GPT with 800K parameters runs inference in 8KB of SRAM, demonstrating extreme model compression.
Lumisift: Open-source tool improving data retention in RAG pipelines from 40% to 87% by fixing retrieval accuracy for scientific documents.
CLI tool generating AI configuration files for Claude Code, GitHub Copilot, and Cursor from unified templates.
LLM-Wiki: Obsidian-based persistent agent memory system with searchable indexed documents, inspired by Andrej Karpathy's concept.
Minimal stub post with no content.
Agentjail: Minimal Linux sandbox for running untrusted AI agent code, with DNS filtering and optional GPU passthrough support.
Incomplete article fragment on LLM regulation in China.
is.team: AI-native project management platform with built-in AI agents as teammates, workflow automation, and external agent integration.
Local LLM server supporting Intel NPU and ARC GPUs with OpenAI/Ollama-compatible APIs, auto-detecting hardware.
Proactive AI agent on iMessage with 135+ tools that anticipates user needs by integrating email, calendar, and other services.
TUI diff viewer tool designed for AI agents to review code changes inline with annotations, enabling feedback loops without leaving terminal.
Claude Code skill using cryptographic randomness and Tarot cards to resolve ambiguous planning decisions through entropy injection.
HN discussion thread asking users about their favorite AI agents and reasons for preference.
Minimal stub post about photo storage without AI.