skillpack.co
All problems

Software Factories

Autonomous coding agents that plan, write, test, and ship code with minimal human oversight. Claude Code leads on benchmarks, community signal, and platform distribution (Apple Xcode). Cursor leads on revenue and event-driven automation. Gemini CLI is the free-tier disruptor. The category has split into CLI-first (Claude Code, Codex CLI, Gemini CLI), IDE-integrated (Cursor, Copilot, Cline), open-source (OpenHands), and enterprise-managed (Augment, Factory). SWE-bench Verified is dead — Pro is the new standard.

18

Ranked

12

Signals

Current ranking

1
Claude Code98

Best for: Developers who want the most capable coding agent — complex multi-file refactors, greenfield projects, long-running autonomous execution

80,078 stars, 51 commits/month, v2.1.80 (2026-03-19). SWE-bench Pro 49.8% (custom scaffolding) — top tier. SWE-bench Verified 80.9% (Opus 4.5) — #1 overall. HN: 2,127 pts top story; 16,000+ pts across top-20 stories — more than all competitors combined. 46% 'most loved' (morphllm survey) vs Cursor 19%, Copilot 9%. Apple Xcode 26.3 native integration (2026-02-26). ~$2.5B run-rate (unconfirmed).

Coding CLIs / Code Agents #1
2
Cursor Automations39

Best for: Teams that want an all-in-one IDE with autonomous background agents AND event-driven automation (Slack → agent writes PR → human reviews)

$2B ARR doubled in 90 days (Bloomberg, 2026-03-02), 1M+ DAU, $29.3B valuation. SWE-bench Pro 50.2% (custom) — marginally #1 on Pro. Automations: event-driven triggers (Slack, PagerDuty, Linear, webhooks, cron) launched 2026-03-05. 35% of Cursor's own PRs merged by agents. Enterprise 60% of revenue. Named customers: OpenAI, Midjourney, Perplexity, Shopify.

3
Gemini CLI88

Best for: Budget-conscious developers, Google Cloud/Android shops, and anyone wanting a capable free coding CLI

98,380 stars (#1 in category). SWE-bench Verified 80.6% (Gemini 3.1 Pro) — near Claude Code. SWE-bench Pro 43.3% (Gemini 3 Pro SEAL). Free tier: 1,000 req/day. HN: 1,428 pts top story. 709 commits/month — very active.

Coding CLIs / Code Agents #2
4
Codex CLI87

Best for: Developers in the OpenAI ecosystem wanting a fast, token-efficient coding CLI with strong benchmark scores

66,359 stars. SWE-bench Pro 57.0% (custom) — #1 on Pro by this metric. Terminal-Bench 77.3% (#2). 896 commits/month — highest commit velocity. Rust-based, sandbox-first execution. HN: 587 pts top story. 1M+ first-month users. Free with ChatGPT subscription.

Coding CLIs / Code Agents #4
5
GitHub Copilot Coding Agent42

Best for: Enterprise teams already on GitHub needing zero-friction async coding with compliance baked in

20M+ users, 4.7M paid (75% YoY growth), ~90% Fortune 100. Multi-model GA (Claude + Codex, Feb 2026). Agentic Code Review GA (Mar 2026). CLI GA (Feb 2026). SWE-bench Verified 56.0%. Jira integration public preview.

6
Augment Code / Intent Agent40

Best for: Large enterprises with massive monorepos needing deep codebase understanding via enterprise sales

$252M total funding (confirmed TechCrunch). SWE-bench Pro 51.8% (#1 custom). 70.6% Verified (third-party, unaudited). Context Engine indexes entire codebases. Tekion case study: time-to-merge dropped 60%.

7
OpenHands88

Best for: Regulated industries needing on-prem/air-gapped deployment with MIT-licensed, model-agnostic agents

69,425 stars, 455 contributors, MIT license. $23.8M raised. ICLR paper. #1 Multi-SWE-Bench (8 languages). SWE-bench Verified 43.2%. Planning Agent v1.5.0 (Mar 2026). 272 commits/month.

Coding CLIs / Code Agents #5Teams of Agents / Multi-Agent Orchestration #1
8
Cline (cline.bot)73

Best for: VS Code users wanting BYOM flexibility with the largest IDE-agent install base

59,157 stars, 5M+ installs across platforms. Apache 2.0, BYOM. $32M funding (Emergence Capital). Named enterprise customers: Salesforce, Samsung, SAP. 155 commits/month, v3.74.0 (2026-03-19).

Coding CLIs / Code Agents #6
9
Aider86

Best for: CLI power users wanting open-source, multi-model, git-native pair programming with zero lock-in

42,157 stars. 49.2% SWE-bench Verified — independently reproducible. Apache 2.0, BYOK. Good multi-model and git integration. Pioneer in CLI-based AI coding.

Coding CLIs / Code Agents #10
10
Devin (Cognition)41

Best for: Teams wanting a fully sandboxed autonomous agent — Windsurf acquisition could change the picture

$10.2B valuation, ~$900M total funding. Enterprise customers: Goldman Sachs, Santander, Nubank. Full browser + terminal + editor sandbox. Acquired Windsurf — may gain IDE distribution. 502 HN pts.

Below the cut line
11
Replit Agent 445

Best for: Non-developer / vibe-coding audience building full-stack apps with minimal code knowledge

Parallel sub-agents (auth, DB, backend, frontend simultaneously), mobile app generation, infinite canvas design variants. ChatGPT distribution partnership. 2.28M monthly visits.

12
Factory AI (Droids)62

Best for: Large enterprises (5,000+ engineers) wanting vendor-managed, compliance-friendly coding agent with white-glove support

$70M total funding. Terminal-Bench contribution (ICLR 2026 paper). Wipro partnership (tens of thousands of engineers). Customers: MongoDB, EY, Bayer, Zapier, Clari. Sequoia/NVIDIA backed.

Teams of Agents / Multi-Agent Orchestration #5
13
Kiro (AWS)42

Best for: Suspended — safety incident (6.3M orders lost) pending 90-day safety reset

AWS GovCloud launch (Feb 2026). Spec-driven workflow is a genuine differentiator. AWS backing provides distribution floor. Free-$39/mo.

Coding CLIs / Code Agents #22
14
Jules (Google)38

Best for: Free experimentation — proactive task scanning is unique but not yet a daily driver

Google backing. 2.28M beta visits. Proactive task scanning (finds TODOs unprompted) — unique. Free 15 tasks/day. 534 + 339 HN pts.

15
Goose (Block)82

Best for: Free, open-source, provider-agnostic alternative — AAIF founding project

33,269 stars, 395 contributors, Apache 2.0. Linux Foundation AAIF founding member. 338 commits/month. 60% Block employee adoption. MCP reference implementation.

Coding CLIs / Code Agents #7
16
Amp (Amp Inc.)52

Best for: CLI-first power users who value Sourcegraph code intelligence lineage and BYOK flexibility

Co-founded by Quinn Slack and Beyang Liu (Sourcegraph). Self-reported profitable; Sequoia/a16z backing. CLI-first with Smart/Rush/Deep modes; Agent Skills system.

Coding CLIs / Code Agents #13
17
SWE-agent79

Best for: Research-grade autonomous bug fixing and benchmark reproducibility

Princeton research project. 18.7K stars. MIT licensed. mini-swe-agent (100 lines) scores >74% SWE-bench.

Coding CLIs / Code Agents #14Teams of Agents / Multi-Agent Orchestration #9
18
Ralph Loop Agent60

Best for: Simple, controllable autonomous loops with human-readable state

Simplest pattern: while-true prompt loop with file persistence. Adopted by Anthropic, Vercel, Block's Goose.

Teams of Agents / Multi-Agent Orchestration #10

Head to head

Copilot Coding AgentvsCursor Automations

Copilot: 15M devs, 12K orgs, 564 HN pts, proven track record. Cursor: $2B ARR, event-driven triggers, 7M MAU but 7 HN pts on Automations. Copilot wins on distribution and trust; Cursor wins on autonomy model. Copilot for now; Cursor could challenge within 6 months.

Copilot Coding AgentvsOpenHands

Different lanes. Copilot: zero-friction, GitHub-native, no setup. OpenHands: self-hostable, model-agnostic, MIT, scales to 1000s of parallel tasks. Copilot for GitHub-native teams; OpenHands for control, self-hosting, data sovereignty.

Cursor AutomationsvsDevin / Cognition

Cursor: 13x Devin's pre-acq revenue, event-driven triggers are novel, $2B ARR. Devin: more autonomy history, 67% merge rate on defined tasks, $10.2B valuation. Cursor wins on revenue and trigger model; Devin has more autonomy history but weaker product evidence.

OpenHandsvsDevin / Cognition

OpenHands: 68K stars, MIT, free, model-agnostic, broader enterprise logos (AMD, Apple, Google, NVIDIA). Devin: $10.2B valuation, $150M+ ARR, but 15% complex-task success. OpenHands for control and cost; Devin for turnkey defined-task automation.

Factoryvsfield

Factory: Terminal-Bench #1, Sequoia+NVIDIA backing, enterprise customers. But 7 HN pts with 0 comments = near-zero community validation. Investor excitement ≠ developer adoption. Needs independent verification to move up.

Public signals

What changes this

If Augment Code / Intent publishes a verified SWE-bench submission → moves to #2 or #1 if score holds at 70%+.

If Amp publishes an open benchmark or third-party review → enters top 5.

If Kiro publishes a SWE-bench result and clarifies the outage → moves from #9 to #6-7.

If Devin submits a current, verified SWE-bench run → re-enters top tier from archived.

If Jules publishes any quantitative capability evidence → exits watch status.

If Cursor Automations community reception grows (HN currently 7 pts) → confirms or denies whether event-driven paradigm has real developer demand.

If SWE-bench Pro adoption becomes standard → all scores above 50% should be discounted ~20%; recalibrate entire ranking.

If OpenHands closes a large named enterprise deal → strengthens #3 claim on enterprise trust.

If another major safety incident at any ranked tool → that tool drops 2+ ranks and gets safety warning.