OpenHands: 69K vs 610 stars, open-source vs proprietary, AMD hardware partnership, broader community by 10x. Factory wins on funding ($50M vs $18.8M), Terminal-Bench #1, and managed service. But Factory’s two contradictory reviews (Every.to vs hyperdev) signal high-variance UX. Different buyers — self-hosted vs managed.
Teams of Agents / Multi-Agent Orchestration
Four distinct buyer segments with almost no cross-over: (1) Agent frameworks/SDKs — build multi-agent systems in code (LangGraph, CrewAI, OpenAI Agents SDK, Mastra); (2) Autonomous coding agents — delegate software development to an agent (OpenHands, Factory AI); (3) Parallel agent IDEs — run multiple coding agents simultaneously and compare results (Emdash, Superset); (4) Workflow automation with agents — orchestrate integrations visually (n8n). Ranking all on a single list is misleading — each serves a different buyer.
16
Ranked
12
Signals
Verdict
FRAMEWORKS (Python): LangGraph is #1 — 40.2M PyPI/month (7× #2), Fortune 500 deployments independently verified (LinkedIn, Uber, Klarna, Cloudflare), LangSmith best-in-class observability. CrewAI is #2 — fastest to prototype, real Fortune 500 customers, 5.7M/month. OpenAI Agents SDK is #3 for OpenAI-committed teams — pre-1.0, no state persistence, no MCP. Google ADK is #4 for GCP/Vertex teams. Strands is #5 for AWS Bedrock only (anomalous DL/star ratio). smolagents: research/experimentation only — LocalPythonExecutor must NOT be used in production (JFrog CVE-2025-9959 CVSS 7.6 + NCC Group RCE confirmed).
FRAMEWORKS (TypeScript): Mastra is #1 and the only serious option — 2.0M npm/month, 442-pt HN launch, $13M YC W25. No comparable JS-native competitor exists.
AUTONOMOUS CODING AGENTS: OpenHands is #1 — 69,352 stars, SWE-bench Verified 72% (Claude 4.5 Extended Thinking), $18.8M Series A, AMD partnership. Gap to #2 is enormous. Factory AI is #2 — Terminal-Bench #1 (self-run benchmark), $50M Series B, enterprise-only.
PARALLEL AGENT IDEs: Emdash is #1 — Best-of-N differentiator, Ry Walker Tier 1, YC W26, 206-pt HN, triple issue-tracker. Superset is #2 — 7,386 stars (2.7×), zero telemetry, Apache 2.0, desktop-v1.2.1.
WORKFLOW AUTOMATION: n8n is #1 — 179K stars, 3,000+ enterprise customers, 1,100+ integrations. Do not use for code-first agent systems — use LangGraph/CrewAI for that.
AutoGen (Microsoft): archived — officially in maintenance mode (VentureBeat 2026-02-19), 1 commit/month, replaced by Microsoft Agent Framework (RC 2026-02-19, GA ~Q2 2026).
The deeper read
The category has four distinct buyer segments with almost no cross-over: (1) Agent frameworks/SDKs — LangGraph dominates Python (40.2M/month), Mastra dominates TypeScript (2.0M/month, no competitor); (2) Autonomous coding agents — OpenHands is the open-source leader (72% SWE-bench), Factory AI is enterprise-only; (3) Parallel agent IDEs — Emdash (Best-of-N, issue tracker) vs Superset (privacy-first, more stars); (4) Workflow automation — n8n (179K stars, 3K+ enterprises). Ranking all on a single list is misleading.
First-party multi-agent is the biggest threat to the orchestration layer: VS Code 1.109 (Feb 2026) positions as ‘Your Home for Multi-Agent Development,’ Claude Code Agent Teams (experimental) enables peer-to-peer messaging, Codex App runs background container-based agents. The remaining wedge for third-party orchestrators is agent-AGNOSTIC coordination that spans providers.
AutoGen is archived — Microsoft officially confirmed maintenance mode (VentureBeat, 2026-02-19). 1 commit in last 30 days. Replaced by Microsoft Agent Framework (RC 2026-02-19, GA ~Q2 2026). Directing developers to AutoGen in 2026 is directing them to dead-end tooling.
smolagents security: LocalPythonExecutor must NOT be used in production. CVE-2025-9959 (JFrog, CVSS 7.6) + NCC Group (2025-07-28) independently confirmed sandbox escape and RCE paths. Docker/E2B sandboxing is an architectural requirement, not optional.
Excluded from rankings due to star inflation: wshobson/agents (31K stars, INFLATED — 0.99% watcher ratio, zero HN/Reddit), ruvnet/ruflo (21K stars, HIGHLY INFLATED — owner pattern across repos).
The biggest structural risk for parallel agent IDEs: if Claude Code Agent Teams exits experimental or VS Code 1.109 multi-agent matures, third-party orchestrators face existential pressure. Tools that only orchestrate a single provider’s agent (oh-my-claudecode) are most at risk.
Current ranking
Best for: End-to-end autonomous coding platform — self-hostable, model-agnostic, enterprise-validated
69,352 stars (verified), 1M downloads/month PyPI (3.4M all-time, accelerating), $18.8M Series A (Madrona, Menlo, Pillar), AMD Lemonade collaboration, 455 contributors, SWE-Bench Verified 72% with Claude 4.5 Extended Thinking, Multi-SWE-Bench #1 (8 languages). Gap to #2 is enormous on every axis.
⚡ HN engagement under ‘OpenHands’ name is weak (11pts) — community still associates with ‘OpenDevin.’ Enterprise claims are self-reported.
Best for: Multi-agent orchestration with Best-of-N comparison and issue-tracker integration (Linear, Jira, GitHub Issues)
Ry Walker Tier 1 — only orchestrator in top 8 of 38-tool comparison. 206pts HN / 71 comments (strongest of any orchestrator). 21 supported coding agents. v0.4.37 (2026-03-17), 60K total downloads. YC W26.
⚡ 2,741 stars and 60K total downloads — modest. YC Tier List rates it B (‘existential competitive pressure’). Pre-v1.0.
Best for: Simple, privacy-respecting parallel agent execution — the ‘tmux for agents’ buyer on macOS
7,386 stars (verified), 96pts HN with 90 comments, 512 Product Hunt upvotes, desktop-v1.2.1 (2026-03-18). Apache 2.0, zero telemetry, BYOK. Dogfooded: ‘We use Superset to build Superset.’
⚡ macOS only. Not in Ry Walker’s Tier 1 (notable absence). No issue tracker integration or Best-of-N. Bootstrapped 3-person team — execution risk.
Best for: Enterprise teams needing benchmark-leading performance with dedicated onboarding support
Terminal-Bench #1 at 58.8% (beat Claude Code 43.2%, Codex CLI 42.8%). $50M Series B at $300M valuation (NEA, Sequoia, Nvidia). Enterprise customers: MongoDB, EY, Bayer, Zapier.
⚡ Sharply polarized UX: Danny Aziz ‘canceled my AI subscriptions for it’ vs Robert Matsuoka ‘not ready for serious work.’ Closed-source, enterprise pricing only. Two reviews 5 months apart — unclear if UX improved.
Best for: Claude Code power users wanting swarm/parallel agent features as an extension
10,110 stars in ~10 weeks — extraordinary growth rate. 32 agents, 5 execution modes (including Ultrapilot 3-5x parallel, Swarm), smart model routing. Addy Osmani cited it.
⚡ Zero HN posts, zero Reddit discussion, zero independent reviews. 10K stars in 10 weeks with no public discourse is a red flag for potential inflation. Cannot rank until independent validation appears.
Best for: TypeScript-heavy teams wanting autonomous CI fix and programmatic agent orchestration
4,510 stars, MIT, 30 parallel agents, 40K LoC TypeScript. Autonomous CI fix and merge conflict resolution — unique in category. Dogfooded: 86/102 PRs built by agents.
⚡ Only 1 month old. No HN engagement. Press limited to MarkTechPost. Too early to rank — revisit in 60 days.
Best for: Teams prioritizing benchmark performance in multi-agent research tasks (not coding-specific)
GAIA Level 3 #1 (61.5%), DeepSearchQA #1 (87.6%, beat Perplexity by 8.1%). 109pts HN / 69 comments. YC S23. Visual canvas model is differentiated.
⚡ Not a coding tool — research/deliverable platform. Proprietary, private repo. Does not belong in coding-specific rankings unless it ships coding features.
Best for: Issue-level repair with strong academic benchmark credibility
18.7K stars, MIT, Princeton NLP. 79.2% SWE-bench Verified with Opus 4.5. Best single-agent issue fixer.
⚡ Not a multi-agent system. No parallel execution or orchestration. Recommendation: move to Software Factories or Coding CLIs category.
Best for: Loop-pattern reference implementation
Clean loop pattern with Vercel/Anthropic adoption. VentureBeat and The Register coverage.
⚡ No fresh public signals. Loop pattern now offered by Emdash, Composio, and others with more features.
Best for: Python teams building production multi-agent systems — complex stateful workflows, model-agnostic, enterprise observability
40.2M PyPI/month — #1 Python framework by 7×. Independently-verified Fortune 500 deployments: LinkedIn, Uber, Klarna, Cloudflare, Coinbase, Home Depot, Workday. ~400 companies on LangGraph Platform. LangSmith rated best-in-class observability. v1.x stable API. Checkpointing + state persistence.
⚡ Steeper learning curve than CrewAI. '40% faster to production with CrewAI' is a widely-cited finding — accept the tradeoff consciously.
Best for: Python teams wanting fastest path to prototype — role-based workflow automation with Fortune 500 enterprise customers
5.7M PyPI/month (3× growth in 6 months). Fortune 500: PwC, IBM, NVIDIA, DocuSign (multi-source confirmed). YAML-driven role-based orchestration — consistently rated 'fastest to prototype' in 2026 independent reviews. v1.11.0 (2026-03-18) includes CVE fix.
⚡ Observability less mature than LangGraph without AMP Suite. Long-running stateful workflows favor LangGraph architecture.
Best for: Linear handoff-chain agents in pure OpenAI environments — lowest friction if locked into OpenAI models
17.5M PyPI/month. v0.12.4 (2026-03-18). 58 open issues vs 3,300 forks — tight maintenance. Built-in tracing. Rowboat ecosystem (161 HN pts). Under 100 lines for functional multi-agent system (particula.tech).
⚡ Pre-1.0 (v0.x). No native state persistence — Temporal required. No MCP. OpenAI model lock-in. Do not choose if you need model-agnosticism, state persistence, or stable API.
Best for: TypeScript/JavaScript teams — the default choice with no comparable competitor
2.0M npm/month — only JS-native framework at this scale. 22,115 stars; 442-pt HN Show HN (2025-02-19) + 213-pt v1.0 launch (2026-01-20) — strongest JS agent community signal. $13M YC W25 (Paul Graham, Guillermo Rauch, Amjad Masad). Named customers: Replit, PayPal, Adobe.
⚡ Custom license — not MIT/Apache, review before production use. YC W25 = early-stage; production evidence mostly self-reported.
Best for: Teams committed to GCP/Vertex AI deployment — native Cloud Run + Vertex AI Agent Engine integration
4.3M PyPI/month for a framework under 10 months old. Multi-language: Python, TypeScript, Go, Java. Model-agnostic despite Google origins. Native Cloud Run + Vertex AI engine unique GCP advantage. ADK 2.0 Alpha adds graph workflows.
⚡ No independently-verified production case studies outside Google-controlled publications. Lock-in tradeoff unfavorable vs LangGraph for non-GCP teams.
Best for: Teams building on AWS Bedrock who want official AWS-supported tooling
5.5M PyPI/month. Official AWS SDK. A2A protocol support. Internal AWS usage: Amazon Q Developer, AWS Glue. v1.30.0 (2026-03-11).
⚡ Anomalous download/star ratio (1,038 DL/star vs CrewAI 122) — CI/CD pipeline inflation concern. Zero HN organic discussion despite 14M claimed cumulative downloads. High lock-in penalty for non-AWS teams.
Best for: Orchestrating SaaS integrations and processes visually with AI nodes — NOT for building agent systems in code
179,860 stars — largest OSS workflow repo by 2×. 3,000+ enterprise customers, $60M Series B. 1,100+ integrations. Native AI Agent node + MCP. HN 195 pts. If you're wiring together SaaS tools with AI, this is the clear answer.
⚡ Category fit caveat: if you're building an agent system from code, use LangGraph or CrewAI. Do not mix use cases.
See the full comparison.
Stars, downloads, evidence — all skills side by side.
Skills comparison
GitHub stars and evidence count for top ranked skills.
GitHub Stars
Evidence items
+6 more not shown
Star growth over time
GitHub stars trajectory for top skills in this category.
GitHub Stars
Head to head
Emdash: Tier 1 vs Tier 2 independent classification, Best-of-N feature, 22+ agents, Linear/Jira/GitHub Issues integration. Superset: 2.7x more stars, 90 HN comments (more discussion), Apache 2.0, zero telemetry. Emdash for teams with issue trackers; Superset for individuals wanting tmux-for-agents.
Different lanes entirely. OpenHands is a full autonomous platform — delegate whole tasks. Emdash is an orchestration layer — supervise parallel agents yourself. Platform vs multiplexer.
OMC has 3.7x more stars and fastest growth, but is Claude Code-only. Emdash supports 22+ agents and has Ry Walker Tier 1 independent validation. If Claude Code remains dominant, OMC’s lock-in is a strength. If market fragments, Emdash’s agent-agnostic approach wins.
Missing a contender?
If there's a skill we haven't ranked, submit it.
Public signals
40.2M downloads/month — 7× CrewAI (5.7M). ~400 companies on LangGraph Platform. Independently-verified Fortune 500 deployments across multiple sources. LangSmith rated best-in-class observability. Stable v1.x API, full state persistence, MCP support.
5.7M downloads/month, tripled in 6 months. Fortune 500 names (PwC, IBM, Capgemini, NVIDIA, DocuSign) confirmed across independent publications. YAML-driven, role-based — ‘fastest to prototype’ consensus. CVE-responsive.
Only serious JS/TS framework at scale — 2.0M npm/month with no comparable competitor. 442-pt HN Show HN (2025-02-19) + 213-pt v1.0 launch. $13M seed: YC W25, Paul Graham, Guillermo Rauch (Vercel), Amjad Masad (Replit). Named customers: Replit, PayPal, Adobe.
GitHub API verified 69,352 stars. 1M/month PyPI downloads (3.4M all-time, accelerating). 455 contributors. Biweekly releases (v1.5.0 on 2026-03-11). AMD Lemonade collaboration. SWE-bench Verified 72% with Claude 4.5 Extended Thinking. Multi-SWE-Bench #1 (8 languages).
Best-of-N (run multiple agents on same task, ship the best diff), issue tracker integration (Linear, Jira, GitHub Issues — unique), 21 agents, remote SSH. v0.4.37 (2026-03-17), 60K total downloads. Expert endorsement outweighs modest 2,741 stars.
96pts HN with 90 comments (highest comment count in category). desktop-v1.2.1 (2026-03-18). Dogfooded: ‘We use Superset to build Superset.’ Named enterprise users: Amazon, Google, ServiceNow (self-reported).
Danny Aziz (GM of Spiral): ‘canceled Claude + ChatGPT Max plans for Droid.’ Robert Matsuoka: ‘great vision, flawed execution, not ready for serious work.’ Enterprise customers: MongoDB, EY, Bayer, Zapier. Reviews 5 months apart — unclear if UX improved.
32 agents, 5 execution modes, smart model routing. But zero HN posts, zero Reddit, zero independent reviews despite 10K stars. Red flag for potential star inflation. Cannot rank until validation appears.
VS Code 1.109 positions as ‘Your Home for Multi-Agent Development.’ Claude Code Agent Teams (experimental) enables peer-to-peer messaging. Codex App runs background container-based agents. The remaining wedge: agent-AGNOSTIC orchestration that spans providers.
wshobson/agents: 0.99% watcher ratio, zero HN/Reddit, 8 issues for 31K stars (estimated real: 2-5K). ruvnet/ruflo: owner pattern across repos (RuView: 37K stars, 0.58% watcher ratio). Both excluded from all catalog consideration.
JFrog (CVE-2025-9959, CVSS 7.6): sandbox escape via dunder attribute validation bypass in LocalPythonExecutor. NCC Group (2025-07-28): arbitrary file read/write + RCE via prompt injection — no code-level patch possible. Docker or E2B sandboxing is an architectural requirement. Research/experimentation only.
VentureBeat (2026-02-19): Microsoft officially confirmed bug-fixes and security patches only, no new features. 399K PyPI/month (7% of CrewAI). Last release python-v0.7.5 (2025-09-30). Replaced by Microsoft Agent Framework (RC 2026-02-19, GA ~Q2 2026) which combines AutoGen + Semantic Kernel.
What changes this
LangGraph dethronable if: CrewAI’s AMP Suite reaches parity with LangSmith observability AND independent production case studies match LangGraph’s depth. Current gap is too large to close quickly.
OpenAI Agents SDK promotes to #2 (Python) if: state persistence is added natively, API reaches v1.0 stability, and MCP support ships. The download volume already supports #2 — the gaps are structural.
Google ADK promotes in ranking if: independent (non-Google) production case studies emerge with verifiable named users. Currently all evidence runs through Google-controlled publications.
Strands removed from catalog if: CI/CD pipeline download inflation is confirmed. Zero HN signal + anomalous download/star ratio (1,038 vs CrewAI 122) warrants further investigation.
Microsoft Agent Framework enters when GA (~Q2 2026): combines AutoGen + Semantic Kernel, multi-provider (Azure OpenAI, OpenAI, Claude, Bedrock, Ollama), A2A + AG-UI + MCP protocols.
smolagents upgrades from research-only if: HuggingFace ships a production-safe executor (not LocalPythonExecutor) with sandboxing built in and JFrog/NCC Group re-audit with no new findings.
If oh-my-claudecode gets independent reviews (HN launch, blog reviews, comparison posts), it could enter rankings at #3-4 if validation is positive. Currently blocked by evidence gap.
If Emdash crosses 10K stars or ships v1.0, it solidifies #2. The star gap to Superset is the main counterargument.
If Superset ships Linux/Windows support, it eliminates its biggest limitation and could challenge Emdash for #2 among devs who don’t need issue trackers.
If Claude Code Agent Teams exits experimental, it directly compresses oh-my-claudecode and partially compresses Emdash/Superset.
If a credible head-to-head Emdash vs Superset review appears, it would resolve #2/#3 ordering with higher confidence.
If Emdash/Superset reorder: one player ships monetization/enterprise contracts, or one stops shipping for 60 days. Both are pre-revenue early-stage tools — shelf life uncertain.