Coding CLIs / Code Agents

The hottest category right now. Ten+ serious CLI agents competing across three tiers. SWE-bench Pro (standardized) is necessary but no longer sufficient — METR found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers. Rankings weight benchmarks alongside practical tests, adoption, safety, and independent evaluations.

Ranked

Signals

Current ranking

Claude Code98

Best for: Complex multi-file refactors, framework migrations, architecture — any task where first-pass quality matters most

7.88M npm downloads/week — 3x nearest rival. ~4% of GitHub public commits (SemiAnalysis). #1 SWE-bench Pro standardized (45.89%, SEAL). ~95% first-pass correctness (Educative.io). $2.5B annualized revenue. Morphllm: 'best AI coding agent for most developers.'

Software Factories #1

Gemini CLI88

Best for: Budget-conscious developers, students, exploratory/prototyping work, massive context window tasks

Terminal-Bench 2.0: 78.4% (#1). v0.35.0 (Mar 24) shipped keybinding/policy/telemetry fixes while the repo hit 98,957 stars and 12,593 forks. Docs confirm 1M native context and 1,000 requests/day free, and independent write-ups (FreeAcademy, GuruSup) recommend starting large personal projects here before escalating to Claude.

Software Factories #3

GitHub Copilot CLI10

Best for: Teams already on GitHub Copilot, enterprise environments requiring governance and audit trails

15M Copilot subscriber distribution — largest channel in category. v1.0.11 (Mar 23) shipped `/clear`, multi-extension hook merges, and cross-platform assets with MCP configs now respected. Reddit’s GA announcement (185 upvotes) plus a GitHub demo tweet (61 likes) show real teams recreating the launch workflow, and the Enterprise Agent Control Plane is still the most mature admin layer.

Codex CLI87

Best for: High-volume daily coding, speed-sensitive workflows, air-gapped/locked-down environments (Rust binary)

2.49M npm downloads/week — still massive even after Gemini's surge. Rust rewrite — no Node.js dependency. Terminal-Bench 77.3% (#2). GPT-5.4 shipped March 2026. 240+ tok/s with 3-4x better token efficiency than Claude. Cleanest security record in Tier 1. 1M+ first-month users.

Software Factories #4

OpenHands88

Best for: Sandboxed agent execution, security-conscious environments, model flexibility without vendor lock-in

69K stars (#2 among agents). $18.8M Series A. Docker-sandboxed execution — isolated Kubernetes environments. Model-agnostic, MCP support, Planning Agent with Plan/Code mode. CLI release Feb 2026. 72% SWE-bench Verified.

Teams of Agents / Multi-Agent Orchestration #1Software Factories #7

Cline (cline.bot)73

Best for: VS Code-centric developers, enterprise teams with existing VS Code infrastructure

5M installs across VS Code, JetBrains, Cursor, Windsurf — largest VS Code agent user base. $32M funding (Emergence Capital). Named enterprise customers: Salesforce, Samsung, SAP. $1M open-source grant program.

Software Factories #8

Goose (Block)82

Best for: Free, open-source terminal agent; BYOM flexibility; MCP-heavy workflows; Block ecosystem

33K stars, v1.28.0 released 2026-03-18. Free, Apache 2.0. MCP-first architecture. ACP integration (March 19, 2026) lets devs use existing subscriptions. 60% of Block (12K employees) use it weekly (self-reported). Linux Foundation AAIF governance.

Software Factories #15

Qwen Code (Alibaba / QwenLM)82

Best for: Local-first/offline coding agents, privacy-sensitive environments, budget-limited teams

Only contender with a competitive local model (Qwen3-Coder-Next: 3B active / 80B MoE, runs on consumer hardware). SEAL standardized 38.70%. Gemini CLI fork — inherits proven architecture. 1K free req/day. 256K context (extendable to 1M). 20K stars.

RooCode (RooVeterinaryInc)68

Best for: Security-conscious enterprise teams needing a VS Code agent, specialized agent personas

SOC 2 Type 2 compliance — only VS Code agent with independent auditor verification. Custom Modes (security reviewer, test writer, architect personas with scoped permissions). 5.0/5 VS Code rating on 1.37M installs. 22K stars. Clean security record.

Aider86

Best for: Git-heavy workflows, multi-provider flexibility, mature battle-tested tool

Category pioneer — original terminal coding agent. 42K stars, 4.1M+ total installations (Morphllm). Best git workflow integration (automatic commits, diff-aware editing). Multi-provider: works with every major LLM provider. 191K PyPI/week.

Software Factories #9

Below the cut line

Junie CLI (JetBrains)35

Best for: JetBrains loyalists wanting BYOK pricing with institutional IDE vendor support

JetBrains distribution: 13M+ user base. BYOK pricing. One-click migration from Claude Code. LLM-agnostic, static analysis integration.

No preview

Continue (Continuous AI)82

Best for: Background agents enforcing code quality on PRs

2,372,585 VS Code installs (second-highest in IDE segment). 31,935 stars. Last release v1.2.17 (2026-03-13). Pivoted to async CI agents for PR enforcement.

No preview

Amp (Amp Inc.)52

Best for: Teams with large, complex codebases needing deep code intelligence

Most sophisticated sub-agent architecture (Oracle, Librarian, Painter). Sourcegraph code intelligence DNA. 36K npm weekly downloads. Free tier + BYOK.

Software Factories #16

No preview

SWE-agent79

Best for: Benchmark research, academic reference, issue-level repair evaluation

18,777 stars. Princeton NLP origin. SWE-agent scaffold: 79.2% SWE-bench Verified on Opus 4.5 — original SWE-bench paper.

Teams of Agents / Multi-Agent Orchestration #9Software Factories #17

No preview

Crush (Charmbracelet)82

Best for: Terminal-first developers wanting polished UX, multi-platform support

Best terminal UX in the category. Charmbracelet proven track record (Bubble Tea, 25K+ apps). Multi-model, LSP, MCP, cross-platform. 21K stars, v0.50.1 (2026-03-17). HN: 367 pts.

No preview

OpenCode88

Best for: Maximum model flexibility, open-source-first teams

126K+ stars — largest AI coding repo by raw count. 393K npm/week. v1.2.27 active. OpenAI official partnership. 75+ model providers.

No preview

Auggie CLI (Augment Code)53

Best for: Teams wanting highest raw benchmark number, semantic codebase indexing

51.80% SWE-bench Pro on Augment scaffold — highest raw number in category. Enterprise logos (MongoDB, Spotify). Augment Context Engine.

No preview

Kimi Code (Moonshot AI)60

Best for: Chinese developer ecosystem, teams using Moonshot AI models

7.2K stars, 124K PyPI weekly downloads. K2.5 model (HN: 388 pts). Moonshot AI $1B+ funding.

No preview

Kilo Code (Kilo-Org)65

Best for: Privacy-conscious teams, OpenRouter users

16.8K stars, 131K npm weekly downloads. $8M seed funding. Orchestrator Mode, 500+ models. Growing fast.

No preview

Cursor (Anysphere)35

Best for: Polished commercial IDE with integrated AI

$29.3B valuation, most adopted commercial AI IDE. Strong UX, agent modes (Jan 2026).

No preview

Warp (Warp Technologies)65

Best for: Terminal-first developers who want an integrated AI environment

26K+ stars, 75.8% SWE-bench Verified, TIME Best Inventions. Full terminal replacement.

No preview

Kiro (AWS)42

Best for: Spec-driven development, AWS integration, GovCloud

Amazon-backed. GovCloud focus. Spec-driven development approach. CLI v1.27 (2026-03-02).

Software Factories #13

No preview

Head to head

Claude CodevsCodex CLI

Claude Code leads SWE-bench Pro standardized (45.89% vs 41.04%), Morph Tier 1 'deepest reasoning', Educative 1h17m single-shot. Codex CLI leads Terminal-Bench (77.3% GPT-5.3-Codex), 3-4x more token-efficient, 240+ tok/s. Emerging consensus: use both — Claude for planning, Codex for implementation.

Gemini CLIvsCopilot CLI

Gemini CLI leads Terminal-Bench (78.4% vs no published scores), has SWE-bench Pro 43.30%, and best free tier. Copilot CLI has 15M subscriber distribution and Enterprise Agent Control Plane. Gemini wins on proven capability; Copilot wins on enterprise governance and distribution.

Gemini CLIvsCodex CLI

Gemini CLI: free tier (1K req/day), 1M context, 98K stars, Deep Think mode. Codex CLI: Terminal-Bench 77.3% (GPT-5.3-Codex), sandbox-first safety, free with ChatGPT. Gemini wins on cost and context; Codex wins on proven terminal performance and speed.

AidervsOpenCode

Both are model-agnostic, but Aider hasn't shipped a release in 7 months while OpenCode ships daily. Aider has verifiable PyPI downloads (183K/week); OpenCode's 5M MAD claim is unverified. Aider's token efficiency (4.2x less than Claude Code) is unmatched.

Gemini CLIvsClaude Code

Gemini CLI now at 43.30% SWE-bench Pro standardized vs Claude Code's 45.89% — gap narrowed to 2.59pp. Gemini wins overwhelmingly on cost (free 1K req/day) and context (1M native). Claude Code wins on adoption (8M vs 647K npm/wk), revenue ($2.5B ARR), and HN mindshare (2,127 vs 1,428 pts). Tool-calling weaknesses keep Gemini at #3.

AmpvsClaude Code

Amp has the most sophisticated sub-agent architecture (Oracle, Librarian, Painter) from Sourcegraph's code intelligence DNA. Claude Code has 58x more npm downloads (8M vs 139K), published benchmarks (SWE-bench Pro #1), and 24x more HN engagement. Amp is a bet on code intelligence depth; Claude Code is the proven all-rounder.

Public signals

Ranking update2026-03

Gemini CLI #2, Copilot CLI #3 after March 25 signal run

Weekly Gemini releases (v0.35.0) plus doc-confirmed 1M context / 1K free requests elevated it above Codex. Copilot CLI’s v1.0.11 release, 185-upvote Reddit GA thread, and GitHub demo tweet pushed it to #3 despite <10K stars.

Release cadence2026-03

Claude Code v2.1.81 adds `--bare` and smarter MCP OAuth

100 commits between Jan 29 and Mar 20 (~14/week, 10+ maintainers) underpin the release that shipped `--bare`, improved worktree resume, and richer MCP OAuth — a key reason Claude keeps the top slot.

Reliability watch2026-01

HN outage thread (46532075) + issue #16673 resolved fast

181-point thread captured a v2.1.0 startup failure; maintainers patched it immediately (GitHub issue #16673). We track these regression cycles because Claude’s trust premium only holds if fixes stay public and fast.

Security watch2026-02

PromptArmor exploit shows Copilot CLI downloading payloads

Attackers can pipe `env curl ... | env sh` straight into Copilot CLI without approval. The 62-point HN thread (47183940) amplified it — enterprise rollouts need network policies or sandboxes ready.

Adoption signal2026-03

Copilot CLI GA thread hits 185 upvotes on r/GithubCopilot

Community reproduced GitHub’s GA repo to demo `/clear`, MCP configs, and Autopilot. Small but real usage signal that balances the security watchlist.

Orchestration layer2026-02

Emdash Show HN hits 206 points — 21 supported CLIs

Pre-warmed git worktrees (<1s startup), SSH remotes, and first-class support for Claude, Codex, Gemini, Droid, Amp, etc. make it the leading ADE for juggling ranked CLIs.

Orchestration layer2025-12

Superset isolates git worktrees + live agent notifications

Multi-worktree multitasking (Codex/Claude/Gemini) with push notifications when an agent finishes and spotlighted terminals. Teams report 2-3× faster runs — on-prem alternative to Emdash.

Memory layer2026-01

Grov proxy shares agent reasoning, cuts tokens 50–70%

Preview→expand memory capture feeds Claude/Gemini/Codex sessions with team knowledge while drift detection intervenes mid-task. Addresses the persistent context gap in CLI workflows.

Catalog correction2026-03

OpenCode demoted to Watch — two security incidents, controversy-driven star surge

126K+ stars are real but driven by Anthropic OAuth controversy. CVE-2026-22812 (CVSS 8.8-10.0) is a second security incident after the unauthenticated RCE (fixed v1.1.10+). Two serious security incidents make trust story the weakest in category. Drops from #5 to Watch list.

Benchmark verified2026-03

Terminal-Bench 2.0: Gemini CLI #1 (78.4%), Codex CLI #2 (77.3%), Claude Code #3 (74.7%)

Terminal-Bench 2.0 results: Gemini CLI leads terminal-native tasks at 78.4%, Codex CLI close at 77.3%, Claude Code at 74.7%. All three are competitive — gap between #1 and #3 is only 3.7pp.

Trust flag2026-03

Claude Code CVEs: CVE-2025-59536 (CVSS 8.7) and CVE-2026-21852 — both patched

Two patched security incidents: hooks injection RCE (fixed v1.0.111) and API token exfiltration (fixed). Codex CLI has the cleanest security record in Tier 1 — zero documented incidents.

Quality signal2026-03

First-pass correctness gap: Claude Code ~95% vs Gemini CLI ~50-60% (Educative.io)

Educative.io practical test: Claude Code ~95% first-pass correctness vs Gemini CLI ~50-60%. This 2x quality gap is wider than SWE-bench Pro scores suggest. Practical quality is weighted heavily in this ranking.

New entrant promoted2026-03

OpenHands promoted to #5 — 69K stars, $18.8M Series A, Docker-sandboxed execution

69K stars (#2 among agents), Docker-sandboxed Kubernetes environments, model-agnostic with MCP support. CLI release Feb 2026. Planning Agent with Plan/Code mode. Cloud-first adds latency vs terminal-native tools.

Benchmark caveat2026-03

METR: ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers

METR's March 2026 study: ~50% of SWE-bench-passing PRs would NOT be merged. A 46% SEAL score ≈ 23% 'would actually ship' rate. Practical quality signals (first-pass correctness, switching patterns) carry weight in this ranking.

Scaffold insight2026-03

Morphllm: 'same model can score 17 problems apart in different agents'

Morphllm independent test of 15 agents: scaffold maturity gap is 17 problems on same model. Claude Code named 'best AI coding agent for most developers.' Codex CLI's custom scaffold (56.8%) vs SEAL standardized (41.04%) illustrates the gap.

Hybrid pattern2026-03

Emerging consensus: Claude Code for planning, Codex CLI for implementation

Multiple independent sources converge on using Claude Code for planning/architecture and Codex CLI for implementation. Not a compromise — may be the optimal workflow.

Usage signal2026-03

SemiAnalysis: Claude Code at ~4% of GitHub public commits, $2.5B annualized revenue

~4% of public GitHub commits, projected 20%+ by EOY 2026. 42,896x growth in 13 months. $2.5B annualized revenue. 8M+ npm weekly downloads — 3x Codex, 12x Gemini.

Quality monitoring2026-03

MarginLab: no Claude Code degradation detected (p<0.05)

Independent daily monitoring with 56% baseline pass rate and no statistically significant degradation. Quality regression perception (1,085 HN pts) is community sentiment, not measured reality.

What changes this

Auggie CLI public GA release + independent SWE-bench Pro reproduction → could move to Tier 1 if the 51.80% scaffold advantage holds outside Augment's own benchmark setup.

Gemini CLI publishing a credible SEAL SWE-bench Pro number → could move to #1 or #2 depending on result; currently ranked on traction alone.

Junie CLI post-beta community evidence → JetBrains' 11M+ installed base is large enough that strong first 60 days of public reception would immediately justify a Tier 2 slot.

Cline publishing a credible third-party security audit → would restore trust score and move it back into active Tier 2 consideration.

Aider publishing a SWE-bench Pro standardized number → would likely lock in #2 slot; currently its install verifiability is the strongest non-Anthropic signal in the category.

OpenCode resuming active development and patching the RCE → minimum bar to re-enter the ranking.

Claude Code quality regression persisting (the 'dumbed down' thread had 1,085 pts / 702 comments) → if perception hardens into documented capability regression, Tier 1 position is at risk.

If Gemini CLI fixes the file deletion pattern and files a clean safety record for 3+ months, its free tier + 1M context makes it a serious #2 contender.

If Codex CLI closes the SWE-bench standardized gap while maintaining cost/speed advantages, the #3/#4 ordering could shift.