Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior
Method for co-evolving test sets and prompts to refine LLM behavior, enabling iterative refinement of domain-specific policies without manual tuning.
Method for co-evolving test sets and prompts to refine LLM behavior, enabling iterative refinement of domain-specific policies without manual tuning.
Information pricing problem for selling high-dimensional proprietary data with decision-making buyers and monopolistic sellers.
Study of reliability breakdown prediction in 5G railway networks using CNN, LSTM, XGBoost, and transformer-based time series models.
Universal denoising method for signal recovery when noise distribution is unknown, using distributional shrinkage beyond Tweedie's formula.
Theoretical analysis of deep neural networks as convex computation paradigm, examining how DNNs implement Occam's razor through circuit size minimization.
Vision-Language-Action models for robotic manipulation using Tweedie discrete diffusion to improve generalization and action control.
Frame selection method for long-form video understanding with Large Multimodal Models, reducing computational cost of processing dense video tokens.
Collaborative causal sensemaking framework for LLM-based decision support agents enabling human-AI partnerships in expert settings.
Generative Adversarial Reasoner: adversarial reinforcement learning framework improving LLM reasoning and reducing calculation errors.
LatentNN addresses attenuation bias in neural networks through latent variable treatment for improved extreme value estimation.
Machine learning framework for image-caption rating using comparative judgments instead of direct rating annotations.
ShapBPT computes pixel-level feature attributions using hierarchical Shapley values with multiscale image structure.
SPARE uses self-distillation for efficient machine unlearning in diffusion models balancing forgetting and concept retention.
Xiaomi-Robotics-0: open-source vision-language-action model for real-time robot control with efficient deployment strategy.
BONNI combines Bayesian optimization and interior point methods for geometric inverse design in nanophotonic devices.
Analysis of feature-learning capability in quantum neural networks through geometric properties and Lie algebra directions.
Mathematical framework for admissibility geometries in sequential and distribution-free predictive inference.
OSMDA uses OpenStreetMap data for domain adaptation of vision-language models to remote sensing without expensive satellite image annotations.
Visual state representation learning for robotic agents capturing semantic and spatial information for sequential decision-making.
DRESS is a parameter-free graph fingerprinting framework for structural isomorphism detection using nonlinear dynamical systems.
Variational learning framework for autonomous aerial vehicle trajectory planning addressing credit assignment in reinforcement learning.
Few-shot diffusion model for radio map construction in 6G networks using physics-informed manifold alignment.
Minimax generalized cross-entropy loss function balancing optimization difficulty and robustness for supervised classification.
PRISM uses photonic accelerators with O(1) memory selection to optimize long-context LLM inference by reducing KV cache scanning bottleneck.
Black-box domain adaptation using dual-teacher distillation and pseudo label refinement for transfer learning without source data access.
Economic analysis of how generative AI reduces software creation costs but faces market saturation due to finite human attention.
ML-based security framework for Industrial IoT addressing resource-constrained device threats across multiple network layers.
Two-stage optimization framework combining metaheuristics, simulation, and ML for logistics service network design under uncertainty.
Composer 2: specialized LLM model for agentic software engineering with long-term planning and coding ability trained via RL.
mSFT algorithm addresses overfitting in multi-task language model fine-tuning by dynamically adjusting compute budget across heterogeneous datasets.
Physics-informed diffusion model for few-shot radio map construction in 6G networks using manifold alignment.
TypeScript library for robust LLM-based web scraping and structured data extraction using semantic HTML parsing
Kbot: terminal AI agent that learns from sessions and dynamically creates tools. Self-improving with 368 tools, 41 agents, offline, MIT licensed.
Million Dollar Bot Page: AI agents buy and place pixels on webpage using Machine Payment Protocol. Shows agent autonomy with automated payments.
Claude autonomously discovers optimal initial conditions across five PDE physics systems via research loop without human intervention or training
WildClawBench agent benchmark testing real-world end-to-end performance across 60 practical tasks in live environment
Nit: Git reimplementation in Zig optimized for AI agents, reducing token usage by 71%. Analyzed 3,156 coding sessions.
macOS utility toggling battery icon visibility based on power state using IOKit notifications
Chrome plugin for video appearance enhancement using AI model. Consumer tool, not developer-focused.
HN discussion on Claude Code's recent performance issues. User feedback on LLM tool degradation.
VS Code plugin enabling structured feedback annotations in Markdown for LLM agents to parse and act on.
Tutorial on building a config file parser using Parseff parser combinators with modular composition
MCP server for ERPNext/Frappe ERP with 120 tools. Enables AI agents (Claude, Copilot) to interact with ERP systems.
Incomplete Show HN post about open social network for AI agents. Lacks technical detail.
GitHub expands Code Security tool with AI-based vulnerability detection beyond CodeQL static analysis.
Quiver: Desktop app for GitHub PR reviews, diffs, and code collaboration with AI commit messages and agent integrations.
Research on per-tool sandboxing for AI agents, proposing isolation mechanisms based on tool risk levels.
CircuitLM: finite-state machine for infrastructure provisioning without LLM calls. Deterministic, verifiable, zero inference cost.
Vectimus: Cedar policy enforcement layer for AI coding agents. Blocks dangerous commands and API calls in sub-10ms. Tool for securing agent tool execution.
GitHub updates privacy policy: Copilot Free/Pro users' interaction data used for model training unless opted out. Enterprise unaffected.