Spine Swarm

watch

GAIA Level 3 #1 (61.5%), DeepSearchQA #1 (87.6%, beat Perplexity by 8.1%). YC S23. Visual canvas where agents collaborate. Benchmark leader in multi-agent research tasks — not coding-specific.

Score 40watch

Where it wins

GAIA Level 3 #1 (61.5%) — top multi-agent research benchmark

DeepSearchQA #1 (87.6%, beat Perplexity by 8.1%)

YC S23 backing

Visual canvas model is differentiated

109 HN points, 69 comments

Where to be skeptical

Not a coding tool — research/deliverable platform

Proprietary, private repo, no open-source path

Visual canvas may not serve code workflows

Pricing concerns (~7K credits per demo task)

Editorial verdict

Watch list. Benchmark leader in multi-agent research (GAIA, DeepSearchQA) but not a coding tool. Does not belong in coding-specific rankings unless it ships coding features.

Source

Found via SkillPack? ★ Star us on GitHub

Teams of Agents / Multi-Agent Orchestration

#08of 23

Teams prioritizing benchmark performance in multi-agent research tasks (not coding-specific)

Claude Code

Anthropic's official agentic coding CLI. v2.1.81 (Mar 20) shipped `--bare`, smarter worktree resume, and improved MCP OAuth while the repo crossed 82,204 stars and logged ~14 commits/week across 10+ maintainers. Terminal-native, tool-use-driven, with deep file system + shell access, #1 SWE-bench Pro standardized (45.89%), ~4% of GitHub public commits (SemiAnalysis), $2.5B annualized revenue. 8M+ npm weekly downloads. Opus 4.6 with 1M context.

LangGraph

#1 Python agent framework by production evidence — 40.2M PyPI downloads/month, Fortune 500 deployments (LinkedIn, Uber, Replit, Elastic, Klarna, Cloudflare, Coinbase), ~400 LangGraph Platform companies, LangSmith rated best-in-class observability. Stable v1.x API, model-agnostic, MCP support.

Pydantic AI

#3 Python agent framework by downloads — 15.6M PyPI/month. Built by the Pydantic team. Runtime type enforcement is a genuine differentiator no other framework offers. V1 shipped with Temporal integration for durable execution and Logfire observability. Emerging pattern: 'Pydantic AI for agent logic, LangGraph for orchestration' (ZenML).

AutoGen (Microsoft)

⚠️ MAINTENANCE MODE — Microsoft officially confirmed bug fixes and security patches only, no new features (VentureBeat 2026-02-19). 55.9K stars but only 1.57M PyPI/month — DL/star ratio of 28, the most inflated among active frameworks. Being replaced by Microsoft Agent Framework (AutoGen + Semantic Kernel merge, GA targeted ~Q2 2026). Teams on AutoGen should plan migration.

Public evidence

strong2026-03

GAIA Level 3 #1 + DeepSearchQA #1 benchmarks

Benchmark leader in multi-agent research tasks. GAIA L3 61.5%, DeepSearchQA 87.6% (beat Perplexity by 8.1%). However, benchmarks are research-focused, not coding-focused.

Public benchmark leaderboardsGAIA and DeepSearchQA benchmarks

moderate2026-03

HN discussion — 109 points, 69 comments

Solid engagement. Discussion focuses on research use cases, not coding. Pricing concerns raised (~7K credits per task).

109 points, 69 commentsHacker News community

Raw GitHub source

GitHub README could not be fetched right now.