Teams of Agents / Multi-Agent Orchestration

Four distinct buyer segments with almost no cross-over: (1) Agent frameworks/SDKs — build multi-agent systems in code (LangGraph, CrewAI, OpenAI Agents SDK, Mastra); (2) Autonomous coding agents — delegate software development to an agent (OpenHands, Factory AI); (3) Parallel agent IDEs — run multiple coding agents simultaneously and compare results (Emdash, Superset); (4) Workflow automation with agents — orchestrate integrations visually (n8n). Ranking all on a single list is misleading — each serves a different buyer.

Ranked

Signals

Verdict

FRAMEWORKS (Python): LangGraph is #1 — 40.2M PyPI/month (7× #2), Fortune 500 deployments independently verified (LinkedIn, Uber, Klarna, Cloudflare), LangSmith best-in-class observability. CrewAI is #2 — fastest to prototype, real Fortune 500 customers, 5.7M/month. OpenAI Agents SDK is #3 for OpenAI-committed teams — pre-1.0, no state persistence, no MCP. Google ADK is #4 for GCP/Vertex teams. Strands is #5 for AWS Bedrock only (anomalous DL/star ratio). smolagents: research/experimentation only — LocalPythonExecutor must NOT be used in production (JFrog CVE-2025-9959 CVSS 7.6 + NCC Group RCE confirmed).

FRAMEWORKS (TypeScript): Mastra is #1 and the only serious option — 2.0M npm/month, 442-pt HN launch, $13M YC W25. No comparable JS-native competitor exists.

AUTONOMOUS CODING AGENTS: OpenHands is #1 — 69,352 stars, SWE-bench Verified 72% (Claude 4.5 Extended Thinking), $18.8M Series A, AMD partnership. Gap to #2 is enormous. Factory AI is #2 — Terminal-Bench #1 (self-run benchmark), $50M Series B, enterprise-only.

PARALLEL AGENT IDEs: Emdash is #1 — Best-of-N differentiator, Ry Walker Tier 1, YC W26, 206-pt HN, triple issue-tracker. Superset is #2 — 7,386 stars (2.7×), zero telemetry, Apache 2.0, desktop-v1.2.1.

WORKFLOW AUTOMATION: n8n is #1 — 179K stars, 3,000+ enterprise customers, 1,100+ integrations. Do not use for code-first agent systems — use LangGraph/CrewAI for that.

AutoGen (Microsoft): archived — officially in maintenance mode (VentureBeat 2026-02-19), 1 commit/month, replaced by Microsoft Agent Framework (RC 2026-02-19, GA ~Q2 2026).

The deeper read

The category has four distinct buyer segments with almost no cross-over: (1) Agent frameworks/SDKs — LangGraph dominates Python (40.2M/month), Mastra dominates TypeScript (2.0M/month, no competitor); (2) Autonomous coding agents — OpenHands is the open-source leader (72% SWE-bench), Factory AI is enterprise-only; (3) Parallel agent IDEs — Emdash (Best-of-N, issue tracker) vs Superset (privacy-first, more stars); (4) Workflow automation — n8n (179K stars, 3K+ enterprises). Ranking all on a single list is misleading.

First-party multi-agent is the biggest threat to the orchestration layer: VS Code 1.109 (Feb 2026) positions as ‘Your Home for Multi-Agent Development,’ Claude Code Agent Teams (experimental) enables peer-to-peer messaging, Codex App runs background container-based agents. The remaining wedge for third-party orchestrators is agent-AGNOSTIC coordination that spans providers.

AutoGen is archived — Microsoft officially confirmed maintenance mode (VentureBeat, 2026-02-19). 1 commit in last 30 days. Replaced by Microsoft Agent Framework (RC 2026-02-19, GA ~Q2 2026). Directing developers to AutoGen in 2026 is directing them to dead-end tooling.

smolagents security: LocalPythonExecutor must NOT be used in production. CVE-2025-9959 (JFrog, CVSS 7.6) + NCC Group (2025-07-28) independently confirmed sandbox escape and RCE paths. Docker/E2B sandboxing is an architectural requirement, not optional.

Excluded from rankings due to star inflation: wshobson/agents (31K stars, INFLATED — 0.99% watcher ratio, zero HN/Reddit), ruvnet/ruflo (21K stars, HIGHLY INFLATED — owner pattern across repos).

The biggest structural risk for parallel agent IDEs: if Claude Code Agent Teams exits experimental or VS Code 1.109 multi-agent matures, third-party orchestrators face existential pressure. Tools that only orchestrate a single provider’s agent (oh-my-claudecode) are most at risk.

Current ranking

OpenHands★ 69K+ ↑12%88

Best for: End-to-end autonomous coding platform — self-hostable, model-agnostic, enterprise-validated

69,352 stars (verified), 1M downloads/month PyPI (3.4M all-time, accelerating), $18.8M Series A (Madrona, Menlo, Pillar), AMD Lemonade collaboration, 455 contributors, SWE-Bench Verified 72% with Claude 4.5 Extended Thinking, Multi-SWE-Bench #1 (8 languages). Gap to #2 is enormous on every axis.

⚡ HN engagement under ‘OpenHands’ name is weak (11pts) — community still associates with ‘OpenDevin.’ Enterprise claims are self-reported.

Emdash (YC W26)★ 2,741 51

Best for: Multi-agent orchestration with Best-of-N comparison and issue-tracker integration (Linear, Jira, GitHub Issues)

Ry Walker Tier 1 — only orchestrator in top 8 of 38-tool comparison. 206pts HN / 71 comments (strongest of any orchestrator). 21 supported coding agents. v0.4.37 (2026-03-17), 60K total downloads. YC W26.

⚡ 2,741 stars and 60K total downloads — modest. YC Tier List rates it B (‘existential competitive pressure’). Pre-v1.0.

Superset★ 7.4K+ 66

Best for: Simple, privacy-respecting parallel agent execution — the ‘tmux for agents’ buyer on macOS

7,386 stars (verified), 96pts HN with 90 comments, 512 Product Hunt upvotes, desktop-v1.2.1 (2026-03-18). Apache 2.0, zero telemetry, BYOK. Dogfooded: ‘We use Superset to build Superset.’

⚡ macOS only. Not in Ry Walker’s Tier 1 (notable absence). No issue tracker integration or Best-of-N. Bootstrapped 3-person team — execution risk.

Factory AI (Droid)★ 610 53

Best for: Enterprise teams needing benchmark-leading performance with dedicated onboarding support

Terminal-Bench #1 at 58.8% (beat Claude Code 43.2%, Codex CLI 42.8%). $50M Series B at $300M valuation (NEA, Sequoia, Nvidia). Enterprise customers: MongoDB, EY, Bayer, Zapier.

⚡ Sharply polarized UX: Danny Aziz ‘canceled my AI subscriptions for it’ vs Robert Matsuoka ‘not ready for serious work.’ Closed-source, enterprise pricing only. Two reviews 5 months apart — unclear if UX improved.

oh-my-claudecode★ 10K+ 58

Best for: Claude Code power users wanting swarm/parallel agent features as an extension

10,110 stars in ~10 weeks — extraordinary growth rate. 32 agents, 5 execution modes (including Ultrapilot 3-5x parallel, Swarm), smart model routing. Addy Osmani cited it.

⚡ Zero HN posts, zero Reddit discussion, zero independent reviews. 10K stars in 10 weeks with no public discourse is a red flag for potential inflation. Cannot rank until independent validation appears.

Composio Agent Orchestrator★ 27K+ 58

Best for: TypeScript-heavy teams wanting autonomous CI fix and programmatic agent orchestration

4,510 stars, MIT, 30 parallel agents, 40K LoC TypeScript. Autonomous CI fix and merge conflict resolution — unique in category. Dogfooded: 86/102 PRs built by agents.

⚡ Only 1 month old. No HN engagement. Press limited to MarkTechPost. Too early to rank — revisit in 60 days.

Below the cut line

Spine Swarm (YC S23)★ N/A 38

Best for: Teams prioritizing benchmark performance in multi-agent research tasks (not coding-specific)

GAIA Level 3 #1 (61.5%), DeepSearchQA #1 (87.6%, beat Perplexity by 8.1%). 109pts HN / 69 comments. YC S23. Visual canvas model is differentiated.

⚡ Not a coding tool — research/deliverable platform. Proprietary, private repo. Does not belong in coding-specific rankings unless it ships coding features.

SWE-agent★ 19K+ 68

Best for: Issue-level repair with strong academic benchmark credibility

18.7K stars, MIT, Princeton NLP. 79.2% SWE-bench Verified with Opus 4.5. Best single-agent issue fixer.

⚡ Not a multi-agent system. No parallel execution or orchestration. Recommendation: move to Software Factories or Coding CLIs category.

Ralph Loop AgentOfficial★ 720 44

Best for: Loop-pattern reference implementation

Clean loop pattern with Vercel/Anthropic adoption. VentureBeat and The Register coverage.

⚡ No fresh public signals. Loop pattern now offered by Emdash, Composio, and others with more features.

F01

LangGraph (Python framework)★ 27K+ 78

Best for: Python teams building production multi-agent systems — complex stateful workflows, model-agnostic, enterprise observability

40.2M PyPI/month — #1 Python framework by 7×. Independently-verified Fortune 500 deployments: LinkedIn, Uber, Klarna, Cloudflare, Coinbase, Home Depot, Workday. ~400 companies on LangGraph Platform. LangSmith rated best-in-class observability. v1.x stable API. Checkpointing + state persistence.

⚡ Steeper learning curve than CrewAI. '40% faster to production with CrewAI' is a widely-cited finding — accept the tradeoff consciously.

F02

CrewAI (Python framework)★ 47K+ 73

Best for: Python teams wanting fastest path to prototype — role-based workflow automation with Fortune 500 enterprise customers

5.7M PyPI/month (3× growth in 6 months). Fortune 500: PwC, IBM, NVIDIA, DocuSign (multi-source confirmed). YAML-driven role-based orchestration — consistently rated 'fastest to prototype' in 2026 independent reviews. v1.11.0 (2026-03-18) includes CVE fix.

⚡ Observability less mature than LangGraph without AMP Suite. Long-running stateful workflows favor LangGraph architecture.

F03

OpenAI Agents SDK (narrow: OpenAI-committed teams)Official★ 20K+ 71

Best for: Linear handoff-chain agents in pure OpenAI environments — lowest friction if locked into OpenAI models

17.5M PyPI/month. v0.12.4 (2026-03-18). 58 open issues vs 3,300 forks — tight maintenance. Built-in tracing. Rowboat ecosystem (161 HN pts). Under 100 lines for functional multi-agent system (particula.tech).

⚡ Pre-1.0 (v0.x). No native state persistence — Temporal required. No MCP. OpenAI model lock-in. Do not choose if you need model-agnosticism, state persistence, or stable API.

F04

Mastra (TypeScript — #1 in JS/TS lane)★ 22K+ 73

Best for: TypeScript/JavaScript teams — the default choice with no comparable competitor

2.0M npm/month — only JS-native framework at this scale. 22,115 stars; 442-pt HN Show HN (2025-02-19) + 213-pt v1.0 launch (2026-01-20) — strongest JS agent community signal. $13M YC W25 (Paul Graham, Guillermo Rauch, Amjad Masad). Named customers: Replit, PayPal, Adobe.

⚡ Custom license — not MIT/Apache, review before production use. YC W25 = early-stage; production evidence mostly self-reported.

F05

Google ADK (narrow: GCP/Vertex AI teams)Official★ 18K+ 68

Best for: Teams committed to GCP/Vertex AI deployment — native Cloud Run + Vertex AI Agent Engine integration

4.3M PyPI/month for a framework under 10 months old. Multi-language: Python, TypeScript, Go, Java. Model-agnostic despite Google origins. Native Cloud Run + Vertex AI engine unique GCP advantage. ADK 2.0 Alpha adds graph workflows.

⚡ No independently-verified production case studies outside Google-controlled publications. Lock-in tradeoff unfavorable vs LangGraph for non-GCP teams.

F06

AWS Strands Agents SDK (narrow: AWS Bedrock teams)Official★ 5.3K+ 53

Best for: Teams building on AWS Bedrock who want official AWS-supported tooling

5.5M PyPI/month. Official AWS SDK. A2A protocol support. Internal AWS usage: Amazon Q Developer, AWS Glue. v1.30.0 (2026-03-11).

⚡ Anomalous download/star ratio (1,038 DL/star vs CrewAI 122) — CI/CD pipeline inflation concern. Zero HN organic discussion despite 14M claimed cumulative downloads. High lock-in penalty for non-AWS teams.

F07

n8n (workflow automation — not a code-first framework)★ 180K+ 83

Best for: Orchestrating SaaS integrations and processes visually with AI nodes — NOT for building agent systems in code

179,860 stars — largest OSS workflow repo by 2×. 3,000+ enterprise customers, $60M Series B. 1,100+ integrations. Native AI Agent node + MCP. HN 195 pts. If you're wiring together SaaS tools with AI, this is the clear answer.

⚡ Category fit caveat: if you're building an agent system from code, use LangGraph or CrewAI. Do not mix use cases.

See the full comparison.

Stars, downloads, evidence — all skills side by side.

COMPARE →

Skills comparison

GitHub stars and evidence count for top ranked skills.

GitHub Stars

Evidence items

Strong

Moderate

+6 more not shown

Star growth over time

GitHub stars trajectory for top skills in this category.

GitHub Stars

OpenHands

Head to head

OpenHandsvsFactory AI

OpenHands: 69K vs 610 stars, open-source vs proprietary, AMD hardware partnership, broader community by 10x. Factory wins on funding ($50M vs $18.8M), Terminal-Bench #1, and managed service. But Factory’s two contradictory reviews (Every.to vs hyperdev) signal high-variance UX. Different buyers — self-hosted vs managed.

EmdashvsSuperset

Emdash: Tier 1 vs Tier 2 independent classification, Best-of-N feature, 22+ agents, Linear/Jira/GitHub Issues integration. Superset: 2.7x more stars, 90 HN comments (more discussion), Apache 2.0, zero telemetry. Emdash for teams with issue trackers; Superset for individuals wanting tmux-for-agents.

OpenHandsvsEmdash

Different lanes entirely. OpenHands is a full autonomous platform — delegate whole tasks. Emdash is an orchestration layer — supervise parallel agents yourself. Platform vs multiplexer.

oh-my-claudecodevsEmdash

OMC has 3.7x more stars and fastest growth, but is Claude Code-only. Emdash supports 22+ agents and has Ry Walker Tier 1 independent validation. If Claude Code remains dominant, OMC’s lock-in is a strength. If market fragments, Emdash’s agent-agnostic approach wins.

Missing a contender?

If there's a skill we haven't ranked, submit it.

SUBMIT A SKILL →

Public signals

Python frameworks #12026-03

LangGraph: 40.2M PyPI/month, Fortune 500 verified (LinkedIn, Uber, Klarna, Cloudflare), v1.1.3 (2026-03-18)

40.2M downloads/month — 7× CrewAI (5.7M). ~400 companies on LangGraph Platform. Independently-verified Fortune 500 deployments across multiple sources. LangSmith rated best-in-class observability. Stable v1.x API, full state persistence, MCP support.

Python frameworks #22026-03

CrewAI: 5.7M PyPI/month (3× growth), PwC/IBM/NVIDIA confirmed, v1.11.0 (2026-03-18) with CVE fix

5.7M downloads/month, tripled in 6 months. Fortune 500 names (PwC, IBM, Capgemini, NVIDIA, DocuSign) confirmed across independent publications. YAML-driven, role-based — ‘fastest to prototype’ consensus. CVE-responsive.

TypeScript frameworks #12026-03

Mastra: 2.0M npm/month, 442-pt HN launch, 22K stars, $13M YC W25

Only serious JS/TS framework at scale — 2.0M npm/month with no comparable competitor. 442-pt HN Show HN (2025-02-19) + 213-pt v1.0 launch. $13M seed: YC W25, Paul Graham, Guillermo Rauch (Vercel), Amjad Masad (Replit). Named customers: Replit, PayPal, Adobe.

Autonomous coding agents #12026-03

OpenHands: 69,352 stars, SWE-bench Verified 72% (Claude 4.5 Extended Thinking), $18.8M, AMD partnership

GitHub API verified 69,352 stars. 1M/month PyPI downloads (3.4M all-time, accelerating). 455 contributors. Biweekly releases (v1.5.0 on 2026-03-11). AMD Lemonade collaboration. SWE-bench Verified 72% with Claude 4.5 Extended Thinking. Multi-SWE-Bench #1 (8 languages).

Orchestration Tier 12026-03

Emdash: Ry Walker Tier 1 (only orchestrator in top 8 of 38), 206pts HN, Best-of-N, v0.4.37

Best-of-N (run multiple agents on same task, ship the best diff), issue tracker integration (Linear, Jira, GitHub Issues — unique), 21 agents, remote SSH. v0.4.37 (2026-03-17), 60K total downloads. Expert endorsement outweighs modest 2,741 stars.

Privacy-first multiplexer2026-03

Superset: 7,386 stars, 512 PH upvotes, Apache 2.0, zero telemetry, desktop-v1.2.1

96pts HN with 90 comments (highest comment count in category). desktop-v1.2.1 (2026-03-18). Dogfooded: ‘We use Superset to build Superset.’ Named enterprise users: Amazon, Google, ServiceNow (self-reported).

Benchmark leader (polarizing UX)2026-03

Factory AI: Terminal-Bench #1 (58.8%), $50M Series B, sharply contradictory reviews

Danny Aziz (GM of Spiral): ‘canceled Claude + ChatGPT Max plans for Droid.’ Robert Matsuoka: ‘great vision, flawed execution, not ready for serious work.’ Enterprise customers: MongoDB, EY, Bayer, Zapier. Reviews 5 months apart — unclear if UX improved.

Watch: potential inflation2026-03

oh-my-claudecode: 10K stars in 10 weeks, zero independent validation

32 agents, 5 execution modes, smart model routing. But zero HN posts, zero Reddit, zero independent reviews despite 10K stars. Red flag for potential star inflation. Cannot rank until validation appears.

Existential threat: first-party2026-03

VS Code 1.109 + Claude Code Agent Teams + Codex App: first-party multi-agent shipping everywhere

VS Code 1.109 positions as ‘Your Home for Multi-Agent Development.’ Claude Code Agent Teams (experimental) enables peer-to-peer messaging. Codex App runs background container-based agents. The remaining wedge: agent-AGNOSTIC orchestration that spans providers.

Star inflation excluded2026-03

wshobson/agents (31K) and ruvnet/ruflo (21K) excluded for inflated stars

wshobson/agents: 0.99% watcher ratio, zero HN/Reddit, 8 issues for 31K stars (estimated real: 2-5K). ruvnet/ruflo: owner pattern across repos (RuView: 37K stars, 0.58% watcher ratio). Both excluded from all catalog consideration.

⚠️ Security: do not deploy2026-03

smolagents: CVE-2025-9959 (CVSS 7.6) + NCC Group RCE — LocalPythonExecutor must not be used in production

JFrog (CVE-2025-9959, CVSS 7.6): sandbox escape via dunder attribute validation bypass in LocalPythonExecutor. NCC Group (2025-07-28): arbitrary file read/write + RCE via prompt injection — no code-level patch possible. Docker or E2B sandboxing is an architectural requirement. Research/experimentation only.

❌ Archive: AutoGen2026-03

AutoGen: Microsoft confirmed maintenance mode (2026-02-19) — replaced by Agent Framework

VentureBeat (2026-02-19): Microsoft officially confirmed bug-fixes and security patches only, no new features. 399K PyPI/month (7% of CrewAI). Last release python-v0.7.5 (2025-09-30). Replaced by Microsoft Agent Framework (RC 2026-02-19, GA ~Q2 2026) which combines AutoGen + Semantic Kernel.

What changes this

LangGraph dethronable if: CrewAI’s AMP Suite reaches parity with LangSmith observability AND independent production case studies match LangGraph’s depth. Current gap is too large to close quickly.

OpenAI Agents SDK promotes to #2 (Python) if: state persistence is added natively, API reaches v1.0 stability, and MCP support ships. The download volume already supports #2 — the gaps are structural.

Google ADK promotes in ranking if: independent (non-Google) production case studies emerge with verifiable named users. Currently all evidence runs through Google-controlled publications.

Strands removed from catalog if: CI/CD pipeline download inflation is confirmed. Zero HN signal + anomalous download/star ratio (1,038 vs CrewAI 122) warrants further investigation.

Microsoft Agent Framework enters when GA (~Q2 2026): combines AutoGen + Semantic Kernel, multi-provider (Azure OpenAI, OpenAI, Claude, Bedrock, Ollama), A2A + AG-UI + MCP protocols.

smolagents upgrades from research-only if: HuggingFace ships a production-safe executor (not LocalPythonExecutor) with sandboxing built in and JFrog/NCC Group re-audit with no new findings.

If oh-my-claudecode gets independent reviews (HN launch, blog reviews, comparison posts), it could enter rankings at #3-4 if validation is positive. Currently blocked by evidence gap.

If Emdash crosses 10K stars or ships v1.0, it solidifies #2. The star gap to Superset is the main counterargument.

If Superset ships Linux/Windows support, it eliminates its biggest limitation and could challenge Emdash for #2 among devs who don’t need issue trackers.

If Claude Code Agent Teams exits experimental, it directly compresses oh-my-claudecode and partially compresses Emdash/Superset.

If a credible head-to-head Emdash vs Superset review appears, it would resolve #2/#3 ordering with higher confidence.

If Emdash/Superset reorder: one player ships monetization/enterprise contracts, or one stops shipping for 60 days. Both are pre-revenue early-stage tools — shelf life uncertain.