Devin (Cognition)

active

Pioneered the async autonomous coding agent category. $10.2B valuation, ~$900M total funding, $150M+ ARR (incl. Windsurf). Enterprise customers: Goldman Sachs, Santander, Nubank. Independent eval (Answer.AI): 15% success on complex tasks.

Generator

Orchestrator

38/100

Trust

N/A

Stars

Evidence

Product screenshot

Videos

Reviews, tutorials, and comparisons from the community.

Devin AI Explained for Beginners (AI Coding Assistant for Software Engineers)

The Cutting Edge School·2025-07-29

Devin 2.0: First-Ever AI Software Engineer IS TRULY INSANE! (Devin IDE, CLI Coder, & More!)

WorldofAI·2025-08-11

I Tried Devin (AI Software Engineer) — Full Review vs Cursor & Copilot

Cloud Champ·2025-07-02

Editorial verdict

Highest-funded pure-play, but the gap between self-reported (67% merge rate) and independent results (15% success) is the defining data point. Business metrics are strong; product evidence on complex tasks is weak.

Source

Public evidence

strong2025-01

Answer.AI: Devin achieves ~15% success on complex real-world tasks

14 failures, 3 successes, 3 inconclusive across 20 tasks. 'Tasks it can do are so small and well-defined that I may as well do them myself.'

Named researchers, detailed methodologyAnswer.AI (independent research)

strong2025-09

Cognition acquires Windsurf, raises to $10.2B valuation

Highest-valued player in the autonomous coding category. Windsurf acquisition adds IDE and 350+ enterprise customers.

CNBC (financial press)CNBC (independent)

moderate2026

Devin ARR estimated at $150M+ — Sacra

$150M+ estimated ARR including Windsurf's $82M ARR. Goldman Sachs piloting 'hundreds to thousands of Devins.'

Market research estimateSacra (independent market research)

moderateSelf-reported2026

67% merge rate on defined tasks — self-reported

67% PR merge rate on well-defined tasks. Contradicted by Answer.AI's 15% on complex tasks. Gap suggests capability limited to narrow, well-scoped work.

Product page claimCognition (self-reported)

How does this compare?

See side-by-side metrics against other skills in the same category.

COMPARE SKILLS →

Where it wins

$10.2B valuation, ~$900M total funding — highest-funded in category

$150M+ combined ARR (incl. Windsurf acquisition)

Enterprise customers: Goldman Sachs, Santander, Nubank

67% PR merge rate on well-defined tasks

530 HN pts on launch + 502 pts on Windsurf acquisition — high engagement

Where to be skeptical

Answer.AI independent eval: 15% success rate (3/20 tasks) — only rigorous independent test

Self-reported 67% merge rate directly contradicts independent eval

Staff exits 3 weeks post-Windsurf acquisition raise integration questions

96% price cut ($500→$20) signals competitive pressure

Ranking in categories

Software Factories

#11of 15

Archived from top tier — benchmark stale since 2024

Know a better alternative?

Submit evidence and we'll run the full pipeline.

SUBMIT →

Similar skills

Aider

Open-source AI pair programming CLI. The only tool in the coding CLI category with a fully verifiable, independent download number: 191,828/week PyPI installs, 5.7M lifetime, 15B tokens/week (homepage stat). Multi-model, git-native, no vendor lock-in. v0.86.2 released 2026-02-12.

OpenHands

Category leader in multi-agent orchestration — 69,352 stars (verified), $18.8M Series A, AMD hardware partnership, 455 contributors, 1M downloads/month PyPI (3.4M all-time). SWE-Bench Verified 72% with Claude 4.5 Extended Thinking (updated 2026-03-19), Multi-SWE-Bench #1 across 8 languages. Gap to #2 is enormous on every axis.

Cline (cline.bot)

Cline is primarily a VS Code extension (3,354,473 VS Code installs, 5M across editors) with a CLI component. $32M raise (Emergence Capital). Named enterprise customers: Salesforce, Samsung, SAP. 59K stars. Supply chain incident: v2.3.0 'OpenClaw' compromise is a documented trust flag. v3.73.0 released 2026-03-16.

Factory AI (Droids)

Terminal-Bench #1 (58.75%), $50M Series B at $300M valuation (Sequoia, NEA, NVIDIA). Wipro partnership (tens of thousands of engineers) — largest enterprise deployment commitment. Previously claimed 84.8% SWE-bench is UNVERIFIED. Zero grassroots developer adoption.

Raw GitHub source

GitHub README could not be fetched right now.