14 failures, 3 successes, 3 inconclusive across 20 tasks. 'Tasks it can do are so small and well-defined that I may as well do them myself.'
Devin (Cognition)
activePioneered the async autonomous coding agent category. $10.2B valuation, ~$900M total funding, $150M+ ARR (incl. Windsurf). Enterprise customers: Goldman Sachs, Santander, Nubank. Independent eval (Answer.AI): 15% success on complex tasks.
38/100
Trust
N/A
Stars
4
Evidence
Product screenshot

Videos
Reviews, tutorials, and comparisons from the community.
Devin AI Explained for Beginners (AI Coding Assistant for Software Engineers)
Devin 2.0: First-Ever AI Software Engineer IS TRULY INSANE! (Devin IDE, CLI Coder, & More!)
I Tried Devin (AI Software Engineer) — Full Review vs Cursor & Copilot
Editorial verdict
Highest-funded pure-play, but the gap between self-reported (67% merge rate) and independent results (15% success) is the defining data point. Business metrics are strong; product evidence on complex tasks is weak.
Source
Public evidence
Highest-valued player in the autonomous coding category. Windsurf acquisition adds IDE and 350+ enterprise customers.
$150M+ estimated ARR including Windsurf's $82M ARR. Goldman Sachs piloting 'hundreds to thousands of Devins.'
67% PR merge rate on well-defined tasks. Contradicted by Answer.AI's 15% on complex tasks. Gap suggests capability limited to narrow, well-scoped work.
How does this compare?
See side-by-side metrics against other skills in the same category.
Where it wins
$10.2B valuation, ~$900M total funding — highest-funded in category
$150M+ combined ARR (incl. Windsurf acquisition)
Enterprise customers: Goldman Sachs, Santander, Nubank
67% PR merge rate on well-defined tasks
530 HN pts on launch + 502 pts on Windsurf acquisition — high engagement
Where to be skeptical
Answer.AI independent eval: 15% success rate (3/20 tasks) — only rigorous independent test
Self-reported 67% merge rate directly contradicts independent eval
Staff exits 3 weeks post-Windsurf acquisition raise integration questions
96% price cut ($500→$20) signals competitive pressure
Ranking in categories
Know a better alternative?
Submit evidence and we'll run the full pipeline.
Similar skills
Aider
91Open-source AI pair programming CLI. The only tool in the coding CLI category with a fully verifiable, independent download number: 191,828/week PyPI installs, 5.7M lifetime, 15B tokens/week (homepage stat). Multi-model, git-native, no vendor lock-in. v0.86.2 released 2026-02-12.
OpenHands
88Category leader in multi-agent orchestration — 69,352 stars (verified), $18.8M Series A, AMD hardware partnership, 455 contributors, 1M downloads/month PyPI (3.4M all-time). SWE-Bench Verified 72% with Claude 4.5 Extended Thinking (updated 2026-03-19), Multi-SWE-Bench #1 across 8 languages. Gap to #2 is enormous on every axis.
Cline (cline.bot)
64Cline is primarily a VS Code extension (3,354,473 VS Code installs, 5M across editors) with a CLI component. $32M raise (Emergence Capital). Named enterprise customers: Salesforce, Samsung, SAP. 59K stars. Supply chain incident: v2.3.0 'OpenClaw' compromise is a documented trust flag. v3.73.0 released 2026-03-16.

Factory AI (Droids)
53Terminal-Bench #1 (58.75%), $50M Series B at $300M valuation (Sequoia, NEA, NVIDIA). Wipro partnership (tens of thousands of engineers) — largest enterprise deployment commitment. Previously claimed 84.8% SWE-bench is UNVERIFIED. Zero grassroots developer adoption.
Raw GitHub source
GitHub README could not be fetched right now.