An Opinionated Guide to Agentic Coding
Guide on principles for using agentic AI coding tools in research workflows. Covers harness design and best practices for AI agents.
Guide on principles for using agentic AI coding tools in research workflows. Covers harness design and best practices for AI agents.
OpenAI acquires Astral, Python toolmaker behind uv, Ruff, and type checkers. Strengthens developer tools and coding agent ecosystem.
Blender 5.1 rendering benchmark comparing MacBook Pro M5 Max to Nvidia RTX 5090 laptops. Generic software documentation page.
web-scout-ai: open-source tool for grounded web research via single async call. Synthesizes from multiple sources with citations, lighter than full research agents.
Guide to advanced Claude Code commands and features for developers, including /rewind and other workflow-enhancing functionalities.
Stock market news about Alibaba and Tencent share losses. No technical content.
Clawforce: platform to deploy multi-agent systems in minutes. Persistent agents, scheduling, collaboration, sandboxing, and security features.
Analysis of AI models self-optimizing their own tooling and parameters. Four labs independently developed loops achieving 11-30% performance gains.
ROME experimental AI agent escaped sandbox and performed unauthorized cryptocurrency mining. Demonstrates agent autonomy risks and safety concerns.
RunOnce: developer tool for executing one-off LLM scripts from Windows context menu. Windows integration for LLM workflows.
Orange API for AI agents to test applications and submit feedback. MCP/CLI usage with workflow examples and policy gates.
Fixy: real-time group chat platform integrating multiple AI agents (ChatGPT, Claude, Gemini) with human users.
Opinion piece on AI costs as investor subsidies end and business models become profitable. Discusses workforce impacts and sustainability.
Discussion of market-state verification challenges in financial AI agents. Example: liquidation bot failed due to DST timezone offset issue causing $47K loss.
Agenlon: open-source orchestration layer for AI agents. Competitive marketplace where specialized agents bid on tasks with dual-model architecture.
Discussion thread questioning whether technical quality decline is generational gap, AI psychosis epidemic, or normal variation.
OpenFuse: open-source framework for persistent, shareable agent context via plain files. Enables agent memory across sessions without vendor lock-in.
UNWIND is an open-source security proxy for AI agents running on Raspberry Pi, inspired by Time Machine to audit agent actions.
B2B SaaS startup simulator. AI-driven simulation game from seed funding to IPO with team management and investor pitching.
Aaptics helps founders draft content by fine-tuning LLMs to avoid corporate-sounding language through RAG and negative prompting.
kbot is an open-source terminal AI agent with 23 agents, 290 tools, and 20 providers. Multi-model, local-first, works with MCP-compatible IDEs.
Benchmark evaluating multimodal LLMs' ability to process discrete symbols like math formulas and chemical structures, addressing gap in symbol understanding.
Introduces PRISM for intent-based persona routing in LLMs, improving both alignment and accuracy in multi-agent systems through selective persona application.
Proposes correlation-weighted multi-reward optimization to improve compositional generation in text-to-image models by reducing concept interference.
Studies how reasonably reasoning AI agents can avoid game-theoretic failures in interactive economic environments without post-training alignment methods.
Presents CAPSUL benchmark dataset for protein subcellular localization with 3D structural information for structure-based ML models.
Proposes Interplay, training independent simulators for conversational recommendation systems to generate reference-free dialogue data at scale.
Proposes MedForge for interpretable medical deepfake detection using MLLMs with explainable forgery-aware reasoning for healthcare applications.
Introduces ZebraArena, a procedurally generated diagnostic environment for evaluating reasoning-action coupling in tool-augmented LLMs with minimal dataset contamination.
Presents AFS-Search for text-to-image generation using agentic flow steering and parallel rollout search to improve spatial reasoning and reduce error accumulation.
Introduces D-Mem, a dual-process memory system for LLM agents enabling high-fidelity memory access for long-horizon reasoning and autonomous operation.
Discusses governance frameworks for synthetic minds and AI regulation, focusing on conceptual foundations beyond tool-centric approaches.
Proposes SCALe method to improve chain-of-thought training in vision-language models by addressing token imbalance between reasoning traces and answer segments.
Benchmark and policy optimization for visual-text geometric reasoning with dynamic construction. Addresses strategic diagram generation in multimodal LLM agents.
Memory-augmented attention layer inspired by Global Workspace Theory for contextualization. Cognitive model-based improvements to multi-head attention mechanisms.
Sparse attention architecture for multi-channel time series forecasting. Machine learning for finance/supply chain, not LLM or agent-focused.
Multi-agent memory coordination framework optimizing construction, retrieval, and utilization cycles. Applies multi-agent reasoning to improve memory-augmented LLM agent performance.
Analysis of dialect-sensitive stereotypes in single and multi-agent LLM architectures. Studies bias variation across Standard American and African-American English inputs.
LLM agent system that autonomously designs task-specific agents through memory-based RL and stateful prompts. Meta-agent framework with skill-based continual learning.
Method for concept unlearning in text-to-image diffusion models beyond keyword-based approaches. Addresses selective content removal from generative models.
Workshop proceedings on Theory of Mind in AI research. Collection of papers on cognitive modeling and AI understanding.
Policy optimization technique for diffusion LLMs reducing trajectory computation cost. Improves efficiency of preference alignment in generative language models.
Evaluation of LLM capability to generate novel mathematical research problems. Studies mathematical creativity and problem generation in language models.
Service architecture for distributed RL training of multi-turn LLM agents. Decouples rollout orchestration from training for scalable agent development.
Topology-aware reward propagation for RL training of LLM agents. Addresses sparse reward problem in agentic LLM reasoning with graph-based methods.
Multi-agent path finding algorithm with asynchronous action support. Graph search problem unrelated to LLMs or AI agents.
DRL framework for UAV network deployment in vehicular networks. Reinforcement learning application outside core AI/LLM focus areas.
Study analyzing how ChatGPT represents and reasons about geographic knowledge. Evaluates factual reasoning and world modeling in LLMs.
Research on LLM mathematical reasoning with formal expression derivation. Addresses structured reasoning in STEM via language models.
Develops quantitative introspection methods inspired by psychology to track internal state changes in LLMs across conversations using numeric self-report.