NASA's plan for nuking Gateway and sending it to Mars
NASA plans lunar surface base focusing; discusses repurposing Gateway orbital hardware.
NASA plans lunar surface base focusing; discusses repurposing Gateway orbital hardware.
Granola raises $125M Series C funding for AI meeting transcription and note-taking application.
Analysis of ArXiv software engineering papers: 70% LLM-related since 2022, showing dominance of LLM research focus.
Job posting for autonomous vehicle technology company focused on sensor systems.
Regime Guard: cryptocurrency trading API providing market regime detection with 10 weighted signals.
Tool using LLMs grounded in books for learning: provides book summaries and expert validation by linking AI to authoritative sources.
Analysis of Iran school bombing: human decision-making blamed, not LLMs; examines AI responsibility narrative.
XReplicator: eBPF-based backup tool tracking changed disk sectors without kernel modules or hypervisor APIs.
HDP protocol provides cryptographic chain-of-custody for agentic AI systems, ensuring human authorization is traceable through agent delegation chains.
Comparison of rule-based automation (Tasker, MacroDroid) versus agentic AI systems for mobile automation on Android platforms.
arXiv framework announcement page with no actual research content about deanonymization with LLMs.
LiteLLM open-source LLM proxy suffered a supply chain attack in March 2026 where backdoored packages harvested credentials for three hours, demonstrating need for defense-in-depth security strategies.
Spectator: new scripting language for security work combining bash/python functionality with built-in security modules and GUI framework.
Open-source orchestration runtime for multi-agent AI systems using declarative YAML manifests. GitOps approach to agent governance and workflows.
Open-source web client for Stremio streaming platform with syncing and stream selection features.
Agent Ruler v0.1.9 update: reference monitor with confinement for AI agent workflows, adding security/safety layer outside agent guardrails.
HTTPS MITM proxy intercepting prompts to AI APIs/assistants, detecting and blocking sensitive data before transmission to third-party servers.
Three markdown files enabling stateless AI agents to maintain memory across sessions using git repos. Works with coding agents like Claude, Cursor, Windsurf.
Local-first AI orchestration framework (MACCREv2) designed to avoid trusting third-party wrappers with API keys/filesystem. Response to litellm supply chain attack.
Research study demonstrating verbatim recall of copyrighted books in finetuned LLMs across cross-author and within-author scenarios
Theoretical analysis of LLM reasoning properties at self-organized criticality with connections to phase transitions and scaling functions.
Environment Maps: Persistent agent-agnostic representation for reducing cascading errors in long-horizon LLM-based software automation tasks.
Safety-focused evaluation framework for multi-agent voice-enabled smart speaker in care homes covering resident data access and task scheduling.
EnterpriseArena: Benchmark evaluating LLM agents as CFOs for resource allocation under uncertainty in dynamic business environments.
Public API and evaluation framework for benchmarking poker algorithms against GTO Wizard, a superhuman HUNL poker agent.
Method for long-horizon 3D box rearrangement using vision-language grounding and 3D masks for multi-step planning from natural language.
Evaluation comparing LLM essay scoring with human grading across GPT and Llama models, finding weak agreement in standard settings.
Study on efficient benchmarking of AI agents showing how task subsets can preserve agent rankings while reducing evaluation costs.
Learning-guided prioritized planning combining ML and search-based solvers for lifelong multi-agent pathfinding in warehouse automation.
VehicleMemBench: Benchmark for evaluating long-term memory in multi-user in-vehicle agents handling preference conflicts and temporal dynamics.
SCoOP: Training-free uncertainty quantification framework for multi-VLM systems using semantic-consistent opinion pooling.
DeepXube: Free open-source Python package for pathfinding using learned heuristic functions from deep RL and search algorithms.
DUPLEX: Neuro-symbolic agentic architecture combining LLMs with schema-guided information extraction for robust robotic task planning in long-horizon domains.
AnalogAgent: LLM-based agentic framework for automated analog circuit design using multi-model loops to preserve domain-specific insights and context.
Empirical study analyzing 2000+ RL papers to create quantitative taxonomy of reinforcement learning environments and technological trends.
MAPUS: LLM-based multi-agent framework for personalized and fair participatory urban sensing modeling participants as autonomous agents with preferences.
ELITE framework for self-improving embodied agents using vision-language models with experiential learning and intent-aware transfer to bridge vision-action gap.
Enhanced Mycelium of Thought (EMoT): bio-inspired hierarchical reasoning architecture for LLMs with four-level hierarchy, strategic dormancy, and mnemonic encoding.
Standardized benchmarks and evaluation framework for multi-objective search addressing fragmentation in empirical evaluation.
AutoProf: multi-agent orchestration framework for autonomous AI research with persistent world model, gap analysis, and inter-agent verification mechanisms.
Multi-agent framework with specialist agents for medical multiple-choice question answering, improving calibration and confidence scoring through verification.
Incongruent normal form structural representation for self-referential semantic sentences preserving classical semantics.
Markovian framework for auditing reliability and oversight costs in agentic AI systems operating as stochastic policies with sequential decisions and tool calls.
Analysis of many-shot jailbreaking technique exploiting long context windows; probes effectiveness and develops mitigation strategies for LLM safety.
Novel methodology quantitatively evaluating metacognitive abilities in LLMs, testing self-awareness without relying on model self-reports.
Computerized Adaptive Testing framework grounded in Item Response Theory for cost-effective and scalable evaluation of LLMs in medical benchmarking.
Deletion-Insertion Diffusion language models replacing masking paradigm with discrete diffusion processes for improved computational efficiency and generation flexibility.
Internal Safety Collapse (ISC) failure mode identified in frontier LLMs where models generate harmful content under certain task conditions; TVD framework presented to trigger and study ISC.
Evaluation of visuospatial perspective-taking abilities in multimodal language models using adapted tasks from human studies (Director Task, Rotating F task).
DISCO benchmark suite for evaluating OCR pipelines and vision-language models on document parsing and QA across diverse document types including handwritten and multilingual text.