Factory AI (Droids)

active

Terminal-Bench #1 (58.75%), $50M Series B at $300M valuation (Sequoia, NEA, NVIDIA). Wipro partnership (tens of thousands of engineers) — largest enterprise deployment commitment. Previously claimed 84.8% SWE-bench is UNVERIFIED. Zero grassroots developer adoption.

Score 62

Where it wins

Terminal-Bench #1 at 58.75% (beat Claude Code 43.2%, Codex CLI 42.8%)

$50M Series B at $300M valuation — Sequoia, NEA, NVIDIA, J.P. Morgan

Wipro partnership: tens of thousands of engineers — largest enterprise deployment commitment in category

Enterprise customers: MongoDB, EY, Bayer, Zapier, Clari

Danny Aziz (GM of Spiral): 'canceled Claude + ChatGPT Max plans for Droid'

Where to be skeptical

Zero grassroots developer adoption — no significant HN threads, Reddit, or independent reviews

Previously claimed 84.8% SWE-bench score UNVERIFIED by any independent source

Robert Matsuoka: 'great vision, flawed execution, not ready for serious work'

Closed-source, no self-hosting, enterprise pricing only

Only 610 stars — no open-source community path

Editorial verdict

Enterprise-only with legitimate backing, but zero grassroots signal and an unverified benchmark claim. Large enterprises with white-glove support needs may benefit; not for individual developers or small teams.

Source

Found via SkillPack? ★ Star us on GitHub

Videos

Reviews, tutorials, and comparisons from the community.

These Factory AI Droids Built My App in 10 Minutes! (Rust + TypeScript!) 👉 Code, Debug & Ship Apps

AI LABS·2025-06-01

AI Droids, Dev Velocity, and Bulletproof Security | Inside Factory

Max Abram·2025-09-17

Teams of Agents / Multi-Agent Orchestration

#05of 23

Enterprise teams needing managed coding agent service with support contracts

Software Factories

#12of 18

Large enterprises (5,000+ engineers) wanting vendor-managed, compliance-friendly coding agent with white-glove support

Claude Code

Anthropic's official agentic coding CLI. v2.1.81 (Mar 20) shipped `--bare`, smarter worktree resume, and improved MCP OAuth while the repo crossed 82,204 stars and logged ~14 commits/week across 10+ maintainers. Terminal-native, tool-use-driven, with deep file system + shell access, #1 SWE-bench Pro standardized (45.89%), ~4% of GitHub public commits (SemiAnalysis), $2.5B annualized revenue. 8M+ npm weekly downloads. Opus 4.6 with 1M context.

LangGraph

#1 Python agent framework by production evidence — 40.2M PyPI downloads/month, Fortune 500 deployments (LinkedIn, Uber, Replit, Elastic, Klarna, Cloudflare, Coinbase), ~400 LangGraph Platform companies, LangSmith rated best-in-class observability. Stable v1.x API, model-agnostic, MCP support.

Pydantic AI

#3 Python agent framework by downloads — 15.6M PyPI/month. Built by the Pydantic team. Runtime type enforcement is a genuine differentiator no other framework offers. V1 shipped with Temporal integration for durable execution and Logfire observability. Emerging pattern: 'Pydantic AI for agent logic, LangGraph for orchestration' (ZenML).

AutoGen (Microsoft)

⚠️ MAINTENANCE MODE — Microsoft officially confirmed bug fixes and security patches only, no new features (VentureBeat 2026-02-19). 55.9K stars but only 1.57M PyPI/month — DL/star ratio of 28, the most inflated among active frameworks. Being replaced by Microsoft Agent Framework (AutoGen + Semantic Kernel merge, GA targeted ~Q2 2026). Teams on AutoGen should plan migration.

Public evidence

strong2026-03

Every.to — reviewer canceled Claude and ChatGPT Max for Factory Droid

Canceled Max plans for both Claude and ChatGPT in favor of Droid. One of the strongest individual endorsements in the category.

Named reviewer, detailed usage reportDanny Aziz (GM of Spiral, via Every.to)

strong2026-03

hyperdev — 'Promising Concept, Premature Execution'

Specific technical criticisms: slow response times, inferior code quality vs Claude, unusable history UI. Directly contradicts Every.to positive review.

Detailed independent technical reviewRobert Matsuoka (hyperdev, independent reviewer)

strong2026-03

Terminal-Bench #1 — 58.8% (beat Claude Code, Codex CLI)

Droid with Opus scored 58.8% on Terminal-Bench, beating Claude Code (43.2%) and Codex CLI (42.8%). Strongest benchmark result in the autonomous platform subcategory.

Public benchmark resultsTerminal-Bench (independent benchmark)

strongSelf-reported2026

$50M Series B at $300M valuation — Sequoia, NEA, NVIDIA, J.P. Morgan

Highest funding in the multi-agent category. Investor quality (Sequoia, NEA, NVIDIA) signals strong enterprise thesis.

Major venture funding roundSequoia, NEA, NVIDIA (investors)

strongSelf-reported2026

Wipro partnership — tens of thousands of engineers deploying Factory Droids

Largest enterprise deployment commitment in the software factory category. Claims 31x faster feature delivery and 96.1% shorter migration times.

BusinessWire, major enterprise partnershipWipro + Factory AI (joint announcement)

Raw GitHub source

GitHub README could not be fetched right now.