Benchmark leader in multi-agent research tasks. GAIA L3 61.5%, DeepSearchQA 87.6% (beat Perplexity by 8.1%). However, benchmarks are research-focused, not coding-focused.
Spine Swarm
watchGAIA Level 3 #1 (61.5%), DeepSearchQA #1 (87.6%, beat Perplexity by 8.1%). YC S23. Visual canvas where agents collaborate. Benchmark leader in multi-agent research tasks — not coding-specific.
Where it wins
GAIA Level 3 #1 (61.5%) — top multi-agent research benchmark
DeepSearchQA #1 (87.6%, beat Perplexity by 8.1%)
YC S23 backing
Visual canvas model is differentiated
109 HN points, 69 comments
Where to be skeptical
Not a coding tool — research/deliverable platform
Proprietary, private repo, no open-source path
Visual canvas may not serve code workflows
Pricing concerns (~7K credits per demo task)
Editorial verdict
Watch list. Benchmark leader in multi-agent research (GAIA, DeepSearchQA) but not a coding tool. Does not belong in coding-specific rankings unless it ships coding features.
Related

Claude Code
98Anthropic's official agentic coding CLI. v2.1.81 (Mar 20) shipped `--bare`, smarter worktree resume, and improved MCP OAuth while the repo crossed 82,204 stars and logged ~14 commits/week across 10+ maintainers. Terminal-native, tool-use-driven, with deep file system + shell access, #1 SWE-bench Pro standardized (45.89%), ~4% of GitHub public commits (SemiAnalysis), $2.5B annualized revenue. 8M+ npm weekly downloads. Opus 4.6 with 1M context.
LangGraph
95#1 Python agent framework by production evidence — 40.2M PyPI downloads/month, Fortune 500 deployments (LinkedIn, Uber, Replit, Elastic, Klarna, Cloudflare, Coinbase), ~400 LangGraph Platform companies, LangSmith rated best-in-class observability. Stable v1.x API, model-agnostic, MCP support.
Pydantic AI
95#3 Python agent framework by downloads — 15.6M PyPI/month. Built by the Pydantic team. Runtime type enforcement is a genuine differentiator no other framework offers. V1 shipped with Temporal integration for durable execution and Logfire observability. Emerging pattern: 'Pydantic AI for agent logic, LangGraph for orchestration' (ZenML).
AutoGen (Microsoft)
95⚠️ MAINTENANCE MODE — Microsoft officially confirmed bug fixes and security patches only, no new features (VentureBeat 2026-02-19). 55.9K stars but only 1.57M PyPI/month — DL/star ratio of 28, the most inflated among active frameworks. Being replaced by Microsoft Agent Framework (AutoGen + Semantic Kernel merge, GA targeted ~Q2 2026). Teams on AutoGen should plan migration.
Public evidence
Solid engagement. Discussion focuses on research use cases, not coding. Pricing concerns raised (~7K credits per task).
Raw GitHub source
GitHub README could not be fetched right now.