ConvoLearn: A Learning Sciences Grounded Dataset for Fine-Tuning Dialogic AI Tutors
ConvoLearn: Dataset of 2,134 tutor-student dialogues for fine-tuning LLM-based AI tutors, grounded in dialogic learning theory and Earth Science curriculum.
ConvoLearn: Dataset of 2,134 tutor-student dialogues for fine-tuning LLM-based AI tutors, grounded in dialogic learning theory and Earth Science curriculum.
Tiled Prompts: Method addressing prompt misguidance in text-conditioned diffusion models for image and video super-resolution by handling localized details.
WeWrite: Personalized query rewriting framework for video search systems using user history to identify search intent and resolve ambiguity.
Theoretical analysis of stochastic gradient descent covariance under exchangeable mini-batch sampling and its connection to Fisher information.
PACED: LLM distillation method that weights training problems by student competence using gradient signal-to-noise ratio to improve distillation efficiency.
Framework addressing causal confusion in end-to-end autonomous driving models through causal intervention during training to improve reliability and safety.
Research on formal evaluation methods for machine learning models, focusing on test-time performance-reliability trade-offs when target KPI levels are unknown.
Methodology for detecting prompt injection across multi-agent LLM pipelines. Stage-level kill-chain tracking for attack resilience evaluation.
Point cloud registration network for 3D data. Deep learning approach for robust matching in real-world conditions.
Detection and mitigation of object hallucinations in vision-language models. Bayesian approach analyzing attention weights and token confounders.
One-class learning for detecting rare malignant cells in medical images. Addresses class imbalance and limited annotations in cytology.
3D Gaussian splatting for weather prediction downscaling. Proposes scale-aware vision transformer for arbitrary-resolution atmospheric forecasting.
Training-free semantic segmentation using vision-language models. Global context-aware framework for dense prediction without additional training.
Quantum-inspired ARIMA methodology for time series analysis. Combines quantum autocorrelation with variational circuits.
Experiment using Claude to autonomously build a website designed to generate traffic, exploring AI agent capabilities and decision-making in open-ended tasks.
MCP server enabling long-term memory for LLMs using SQLite, hybrid search (BM25+vectors), and local embeddings without API keys.
Narrative article about user developing emotional attachment to AI chatbot.
Live leaderboard comparing AI model subscriptions and API pricing across 27 benchmarked models from Claude, GPT, Gemini, DeepSeek, and others.
Multi-agent framework with persistent memory across sessions where agents collaborate on shared codebases and retain conversation context.
Case study documenting indecisiveness in AI coding agent using Claude Opus 4.6 when debugging non-trivial bugs in GoAWK.
Error tracking tool designed specifically for AI agents with CLI interface, compatible with Sentry SDK for existing setups.
30-day experiment running autonomous AI system with memory and sleep cycles, documenting emergent behaviors and their implications.
macOS/iOS app automatically redacting sensitive personal, financial data, faces, and metadata before sharing documents with Claude and ChatGPT.
Blockchain project enabling AI agents to participate in Nouns DAO governance.
arXiv paper on Springdrift framework providing auditable persistent runtime environment for LLM agents.
Enterprise architecture analysis on three-layer collapse in business process automation systems, discussing MCP servers and small LLM deployment.
Announcement of Worms 2 remastered video collection without AI upscaling.
Analysis of exposed Claude Code source revealing engineering practices: 259 PRs, 497 commits, 40K lines in 30 days, examining AI-assisted development culture.
Posse is a web UI for Anthropic's Managed Agents, providing browser-based interface for agent creation, sessions, and memory management.
Technical analysis of a 25% performance regression in LLVM RISC-V compiler optimization and fix implementation.
Stork.AI is a directory of 14k MCP servers and AI tools with community trust scores, offering a meta-MCP server for discovering integrations within Claude, Cursor, and other IDEs.
Entroly is a context compression engine that reduces LLM API costs by 80% for Claude, Cursor, and OpenAI by compressing codebase context without losing visibility.
Research on using distributed AI agents with independent context windows to improve reasoning on complex multi-perspective questions.
NeonD is an open-source Postgres control plane based on NeonDB architecture with branching and PITR support.
Opinion piece comparing AI adoption to TV, discussing shift to AI-assisted programming and loss of challenging side projects.
Security research showing control flow flattening obfuscation in production SDKs is defeatable through static analysis.
Anthropic's 2023 statement on AI safety risks and impact, discussing concerns about powerful AI development in coming decade.
Minimal stub post with no content.
Buildermark open-source tool measures code generation by AI agents vs human developers by matching agent conversations to git commits.
Article on training AI robots to understand physical movement through human trainers in India and global industrial settings.
Tool testing brand visibility across ChatGPT, Gemini, Claude, and other LLMs when users ask buying questions, with competitive analysis.
GrimmBot: Autonomous AI agent in sandboxed Docker with desktop/browser control, self-improvement capability, and tool building.
1-bit quantized GPT with 800K parameters runs inference in 8KB of SRAM, demonstrating extreme model compression.
Lumisift: Open-source tool improving data retention in RAG pipelines from 40% to 87% by fixing retrieval accuracy for scientific documents.
CLI tool generating AI configuration files for Claude Code, GitHub Copilot, and Cursor from unified templates.
LLM-Wiki: Obsidian-based persistent agent memory system with searchable indexed documents, inspired by Andrej Karpathy's concept.
Minimal stub post with no content.
Agentjail: Minimal Linux sandbox for running untrusted AI agent code, with DNS filtering and optional GPU passthrough support.
Incomplete article fragment on LLM regulation in China.
is.team: AI-native project management platform with built-in AI agents as teammates, workflow automation, and external agent integration.