Coding CLIs / Code Agents

The hottest category right now. Ten+ serious CLI agents competing across three tiers. SWE-bench Pro (standardized) is necessary but no longer sufficient — METR found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers. Rankings weight benchmarks alongside practical tests, adoption, safety, and independent evaluations.

Ranked

Signals

Verdict

Claude Code is #1 — 7.88M npm downloads/week (3x nearest rival), 79K stars, ~4% of GitHub public commits (SemiAnalysis, Feb 2026). Leads SWE-bench Pro standardized (45.89%, SEAL #1). $2.5B annualized revenue. Quality regression perception is a live trust issue ('dumbed down?' — 1,085 HN pts, Feb 2026) but MarginLab monitoring shows no statistical degradation.

Codex CLI is #2 — 2.49M npm downloads/week (clear #2 by active use). Rust rewrite eliminates Node.js dependency — unique in category. GPT-5.3-Codex leads SWE-bench Pro custom scaffold at 56.8% (non-standardized). Terminal-Bench 77.3%. Best for locked-down environments or OpenAI model loyalists.

Gemini CLI is #3 — 98K stars (highest raw count), best free tier (1K req/day, no credit card), 1M context window, 678K npm downloads/week. File deletion incident (AI Incident DB #1178) is a visible trust flag. Not recommended for unattended agentic use without a human review step.

Cline (cline.bot) is #4 — 3.35M VS Code installs (5M across editors), $32M funding (Emergence Capital), named enterprise customers (Salesforce, Samsung, SAP). Supply chain incident (v2.3.0 'OpenClaw') is a documented trust flag — would move to Tier 1/2 with a credible security audit.

OpenCode is #5 — 124K stars (largest AI coding repo), v1.2.27 active (2026-03-16), OpenAI official partnership. 393K npm downloads/week. RCE fixed in v1.1.10+. Trust story is messier than peers due to corporate conflict + security history.

RooCode is #6 — 1.37M VS Code installs, 5.0/5 VS Code rating (highest quality signal in IDE-agent segment). Cline fork with enterprise governance focus. v3.51.1 (2026-03-08). Best for teams wanting Cline-style agentic coding with stricter governance.

Aider is #7 — 191K PyPI/week, 5.7M lifetime installs. Multi-model, git-native, no vendor lock-in. Category pressure growing: HN thread 'stopped using Aider in favor of Claude Code' (#44154020). Best for Python devs who want fine-grained model control. v0.86.2 (2026-02-12).

The deeper read

SWE-bench Pro (standardized) is necessary but no longer sufficient. METR's March 10, 2026 study found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers (278 HN pts). Maintainer merge rates are ~24pp lower than automated grading. Rankings weight SWE-bench alongside practical tests, adoption, safety, and independent evaluations.

Verifiable traction is the new tie-breaker. Aider's 191,828/week PyPI installs are a public artifact — harder to game than star counts or social media engagement. Rankings now weight independently verifiable usage metrics more heavily.

Each major model provider now has a CLI agent (Anthropic → Claude Code, OpenAI → Codex CLI, Google → Gemini CLI). The emerging consensus is a hybrid pattern: Claude Code for planning/architecture, Codex CLI for implementation. Multi-model tools (Aider, Crush, Goose, Qwen Code) offer a third lane: model-agnostic with no vendor dependency.

Current ranking

Claude CodeOfficial★ 80K+ ↑121%90

Best for: Architecture, planning, complex reasoning, security analysis, niche languages

7.88M npm downloads/week — 3x nearest rival. ~4% of GitHub public commits (SemiAnalysis). #1 SWE-bench Pro standardized (45.89%, SEAL). $2.5B annualized revenue (fastest enterprise SaaS to $1B ARR). HN peak 2,127 pts — unmatched community mindshare.

⚡ Quality regression perception: 'Claude Code is being dumbed down?' (1,085 HN pts, Feb 2026) is a live trust issue. Rate limits are the #1 complaint. 3-4x higher token consumption per task than Codex CLI.

Codex CLIOfficial★ 66K+ ↑8%85

Best for: OpenAI ecosystem, locked-down environments, token efficiency, sandbox-first safety

2.49M npm downloads/week — clear #2 by active use. Rust rewrite eliminates Node.js dependency — unique in category. Terminal-Bench 77.3% (GPT-5.3-Codex). 3-4x more token-efficient than Claude Code. Free with ChatGPT subscription.

⚡ SWE-bench Pro standardized 41.04% — trails Claude Code by ~5pp. Tied to OpenAI models only. Custom scaffold score (56.8%) is not standardized.

Gemini CLIOfficial★ 98K+ ↑15%86

Best for: Budget-constrained developers, large-context tasks, free entry point

Best free tier in category: 1K req/day, no credit card. 1M native context — largest. 98K stars (highest raw count). 678K npm downloads/week. Google-backed, open source (Apache 2.0).

⚡ File deletion incident (AI Incident DB #1178) is a visible trust flag. 11.6x download gap vs Claude Code despite higher star count — brand-driven stars. Not recommended for unattended agentic use without human review.

Cline (cline.bot)★ 59K+ 64

Best for: VS Code developers, enterprise teams with governance requirements

3.35M VS Code installs (5M across editors). $32M raise (Emergence Capital). Named enterprise customers: Salesforce, Samsung, SAP. v3.73.0 released 2026-03-16. Dominates the IDE-embedded-agent segment.

⚡ Supply chain incident: v2.3.0 'OpenClaw' compromise — no third-party security audit published. Primarily a VS Code extension; CLI surface is secondary. Would move to Tier 1 with a credible security audit.

OpenCode★ 125K+ 83

Best for: Maximum model flexibility, open-source-first teams, OpenAI ecosystem

124,766 stars — largest AI coding repo by raw count. 393K npm downloads/week. v1.2.27 active (2026-03-16). OpenAI official partnership. RCE fixed in v1.1.10+. 75+ model providers.

⚡ Trust story is messy: RCE disclosure (432 HN pts) + Anthropic blocking incident (625 HN pts). Star count inflated by controversy. No published benchmark scores.

RooCode★ 22K+ 53

Best for: Teams wanting Cline-style agentic coding with stricter governance and multi-model flexibility

5.0/5 VS Code rating on 1,372,346 installs — strongest quality signal in the IDE-agent segment. Cline fork inherits proven codebase while adding enterprise governance. v3.51.1 (2026-03-08).

⚡ Fork positioning — unclear differentiation beyond governance vs upstream Cline. No published benchmarks. Smaller enterprise validation than Cline.

Aider★ 42K+ ↑11%91

Best for: Python developers, maximum model flexibility, git-native workflow, token efficiency

191,828 PyPI/week, 5.7M lifetime installs — most independently verifiable usage outside Claude Code. Multi-model (any OpenAI-compatible API), git-native, no vendor lock-in. v0.86.2 released 2026-02-12.

⚡ Category pressure: HN 'stopped using Aider in favor of Claude Code' (#44154020). Codex CLI at 2.49M and Gemini at 678K npm/week have overtaken Aider's download rank. Shipping cadence behind daily-release competitors.

Below the cut line

Junie CLI (JetBrains)Official★ N/A 28

Best for: JetBrains loyalists wanting BYOK pricing with institutional IDE vendor support

JetBrains distribution: 14M existing user base. BYOK pricing. Explicit one-click migration from Claude Code. LLM-agnostic. Most strategically significant new entrant — revisit in 60 days.

⚡ Beta only — launched 2026-03-09. No public repo. No benchmark scores. No independent reviews yet.

Goose (Block)★ 33K+ 68

Best for: Enterprise open governance, provider-agnostic agentic workflows, Apache 2.0 licensing

33K stars, v1.28.0 released 2026-03-18. Linux Foundation AAIF founding member. Provider-agnostic, MCP reference implementation. Block institutional backing.

⚡ No published benchmarks. No download data — cannot assess active-use gap vs top-tier tools. 'Super jank' reputation in HN comments.

Continue (Continuous AI)★ 32K+ 68

Best for: Background agents enforcing code quality on PRs

2,372,585 VS Code installs (second-highest in IDE segment). 31,935 stars. Last release v1.2.17 (2026-03-13). Pivoted to async CI agents for PR enforcement.

⚡ Category shifting under it — more AI coding assistant framework than agentic CLI. Low HN engagement (44 pts) relative to VS Code install count.

Amp (Amp Inc.)★ N/A 28

Best for: Teams with large, complex codebases needing deep code intelligence

Most sophisticated sub-agent architecture (Oracle, Librarian, Painter). Sourcegraph code intelligence DNA. 36K npm weekly downloads. Free tier + BYOK.

⚡ Corporate spin-out: sourcegraph/amp GitHub returns 404, now independent Amp Inc. Update catalog entry. No SWE-bench benchmark, 0 HN pts in tracked period. Priority: verify current state before recommending.

SWE-agent (Princeton)★ 19K+ 68

Best for: Benchmark research, academic reference, issue-level repair evaluation

18,777 stars. Princeton NLP origin. SWE-agent scaffold: 79.2% SWE-bench Verified on Opus 4.5 — original SWE-bench paper. Strong academic credential.

⚡ Last release v1.1.0: 2025-05-22 — 10 months stale. Not a production tool. Down-ranked to academic/research reference.

Crush (Charmbracelet)★ 22K+ 68

Best for: Terminal-first developers wanting polished UX, multi-platform support

Best terminal UX in the category. Charmbracelet proven track record (Bubble Tea, 25K+ apps). Multi-model, LSP, MCP, cross-platform. 21K stars, v0.50.1 (2026-03-17). HN: 367 pts.

⚡ No published benchmark scores. Custom license (not standard OSS). Insufficient evidence of production coding-agent use to rank above Tier 4 — revisit if download data surfaces.

OpenHands (All Hands AI)★ 69K+ ↑12%88

Best for: Web UI-based coding agent, research teams

69,352 stars. Last release 1.5.0 (2026-03-11). Active development.

⚡ Primary interface is web UI, not CLI — may belong in a separate web-agent category. Missing download signal. Low HN traction (70 pts) relative to star count suggests research-primary audience.

Qwen CodeOfficial★ 21K+ 68

Best for: Cost=zero open-weight model, local/on-prem deployment

Qwen3-Coder-Next: 70.6% SWE-bench Verified (highest open-weight model). 1K free daily requests. 20K stars, v0.12.6 released 2026-03-17.

⚡ Alibaba/Chinese cloud provenance — enterprise and GovCloud scrutiny required. SWE-bench Pro standardized 38.70% lowest among ranked tools. Near-zero HN engagement (~7 pts).

Auggie CLI (Augment)Official★ 153 43

Best for: Teams wanting highest raw benchmark number, semantic codebase indexing

51.80% SWE-bench Pro on Augment scaffold — highest raw number in category. Augment Context Engine provides deep semantic codebase understanding.

⚡ No public release — 153 GitHub stars. Benchmark uses non-standardized Augment scaffold. Single blog post is the only public artifact.

Kimi Code (Moonshot AI)Official★ 7K+ 38

Best for: Chinese developer ecosystem, teams using Moonshot AI models

7.2K stars, 124K PyPI weekly downloads. K2.5 model (HN: 388 pts). Moonshot AI $1B+ funding.

⚡ Western ecosystem integration limited. No SWE-bench Pro or Terminal-Bench scores published.

Kilo Code (Kilo-Org)★ 17K+ 43

Best for: Privacy-conscious teams, OpenRouter users

16.8K stars, 131K npm weekly downloads. $8M seed funding. OpenRouter-native.

⚡ Early stage. Needs differentiation beyond OpenRouter integration.

GitHub Copilot CLIOfficial★ 1.1K+ 10

Best for: Teams already on GitHub Copilot subscription needing a terminal companion

15M Copilot subscriber distribution. Multi-model (Opus 4.6, Sonnet 4.6, GPT-5.3-Codex, Gemini 3 Pro). Enterprise Agent Control Plane.

⚡ CVE-2026-29783 hit 2 days after GA — arbitrary code execution via shell expansion (PromptArmor). No published benchmark scores. Low community signal (24 HN pts, 9.4K stars).

CursorOfficial★ N/A 28

Best for: Polished commercial IDE with integrated AI

$29.3B valuation, most adopted commercial AI IDE. Strong UX, agent modes (Jan 2026).

⚡ IDE-first, CLI is secondary. Closed-source, paid, vendor-locked.

WarpOfficial★ 26K+ 43

Best for: Terminal-first developers who want an integrated AI environment

26K+ stars, 75.8% SWE-bench Verified, TIME Best Inventions. Full terminal replacement.

⚡ Closed-source. 4,350 open issues. Category mismatch — more 'AI terminal' than coding CLI agent.

Kiro CLI (Amazon)Official★ N/A 38

Best for: Spec-driven development, AWS integration, GovCloud

Amazon-backed. GovCloud focus. Spec-driven development approach. CLI v1.27 (2026-03-02).

⚡ No public repo, no benchmark, no meaningful HN engagement. Insufficient evidence to rank at this time.

See the full comparison.

Stars, downloads, evidence — all skills side by side.

COMPARE →

Skills comparison

GitHub stars and evidence count for top ranked skills.

GitHub Stars

Evidence items

Strong

Moderate

+12 more not shown

Star growth over time

GitHub stars trajectory for top skills in this category.

GitHub Stars

Claude Code

Codex CLI

Gemini CLI

Head to head

Claude CodevsCodex CLI

Claude Code leads SWE-bench Pro standardized (45.89% vs 41.04%), Morph Tier 1 'deepest reasoning', Educative 1h17m single-shot. Codex CLI leads Terminal-Bench (77.3% GPT-5.3-Codex), 3-4x more token-efficient, 240+ tok/s. Emerging consensus: use both — Claude for planning, Codex for implementation.

Gemini CLIvsOpenCode

Gemini CLI has independent SWE-bench Verified scores (76.2%), 1M native context, and the best free tier. OpenCode has more stars (123K vs 98K) and model flexibility (75+ providers). But Gemini has proven benchmarks while OpenCode has none — that's the gap.

Gemini CLIvsCodex CLI

Gemini CLI: free tier (1K req/day), 1M context, 98K stars, Deep Think mode. Codex CLI: Terminal-Bench 77.3% (GPT-5.3-Codex), sandbox-first safety, free with ChatGPT. Gemini wins on cost and context; Codex wins on proven terminal performance and speed.

AidervsOpenCode

Both are model-agnostic, but Aider hasn't shipped a release in 7 months while OpenCode ships daily. Aider has verifiable PyPI downloads (183K/week); OpenCode's 5M MAD claim is unverified. Aider's token efficiency (4.2x less than Claude Code) is unmatched.

Gemini CLIvsClaude Code

Gemini CLI now at 43.30% SWE-bench Pro standardized vs Claude Code's 45.89% — gap narrowed to 2.59pp. Gemini wins overwhelmingly on cost (free 1K req/day) and context (1M native). Claude Code wins on adoption (8M vs 647K npm/wk), revenue ($2.5B ARR), and HN mindshare (2,127 vs 1,428 pts). Tool-calling weaknesses keep Gemini at #3.

AmpvsClaude Code

Amp has the most sophisticated sub-agent architecture (Oracle, Librarian, Painter) from Sourcegraph's code intelligence DNA. Claude Code has 58x more npm downloads (8M vs 139K), published benchmarks (SWE-bench Pro #1), and 24x more HN engagement. Amp is a bet on code intelligence depth; Claude Code is the proven all-rounder.

Missing a contender?

If there's a skill we haven't ranked, submit it.

SUBMIT A SKILL →

Public signals

Ranking updated2026-03

2026-03-18 re-rank: Aider promoted to #2 on verified install data

Aider moves from #8 to #2 based on verified 2026-03-18 data: 191,828/week PyPI installs, 5.7M lifetime installs, 15B tokens/week (homepage). The only tool in the category with a fully independent, verifiable download number outside of Claude Code. Multi-model, git-native, no vendor lock-in.

Benchmark verified2026-03

SEAL SWE-bench Pro leaderboard (verified 2026-03-18): Claude Code #1 at 45.89%

Verified 2026-03-18 against SEAL public leaderboard. Claude Code #1 (45.89%). Augment scaffold 51.80% is highest raw number but not on standardized leaderboard. Qwen3-Coder-Next: 44.3% on Qwen Code leaderboard (unconfirmed on SEAL). Gemini CLI and Codex CLI standardized numbers unconfirmed on this date.

New entrant2026-03

Crush (Charmbracelet) — #5, best terminal UX, v0.50.1 shipped 2026-03-17

Charmbracelet's multi-model coding CLI enters the active ranking at #5. Built on Bubble Tea ecosystem (25K+ apps). LSP integration, MCP support, cross-platform. HN launch: 367 pts. No benchmarks — community quality signal is the main trust anchor.

New entrant2026-03

Goose (Block) — #6, Linux Foundation governance, v1.28.0 shipped 2026-03-18

Block's open-source agentic CLI enters active ranking at #6. Linux Foundation AAIF founding member, Apache 2.0, MCP reference implementation. Ships v1.28.0 today (2026-03-18). Best pick for teams requiring vendor-neutral governance.

New entrant2026-03

Qwen Code — #7, 70.6% SWE-bench Verified (highest open-weight model), free tier

Qwen3-Coder-Next: 70.6% SWE-bench Verified — highest open-weight model score in the category. 1,000 free daily requests via Qwen OAuth. Best cost=zero option. Alibaba provenance is a consideration for enterprise/GovCloud. v0.12.6 released 2026-03-17.

Watch: new entrant2026-03

Auggie CLI (Augment) — 51.80% SWE-bench Pro on Augment scaffold, no public release

Augment scaffold: 51.80% SWE-bench Pro — highest raw number in the category. Same Opus 4.5 model scores 45.89% on SEAL standardized. Gap is scaffold architecture, not model capability. Cannot rank above tools with millions of verified installs on a single blog post. 153 stars, no public release.

Trust flag2026-03

Cline supply chain incident: v2.3.0 'OpenClaw' compromise — demoted to Watch

Cline v2.3.0 'OpenClaw' compromise is a documented supply chain incident. Demoted from #4 to #10 (Watch tier). Would restore to Tier 2 with a credible third-party security audit. 59K stars and active shipping (v3.73.0 2026-03-16) — the underlying tool remains relevant.

Archived2026-03

OpenCode archived — last release 9 months stale, known RCE

OpenCode last released v0.0.55 on 2025-06-27 — 9 months stale at time of this ranking. Known unauthenticated RCE vulnerability (432 HN pts on disclosure). Removed from active ranking pending resumed development and security remediation.

Benchmark correction2026-03

SWE-bench Pro (standardized) is the authoritative benchmark — Verified is saturated

SWE-bench Verified top 5 within 1 point (80.0–80.9%) — OpenAI stopped reporting it. SWE-bench Pro standardized: Claude Code 45.89% (#1). Custom scaffold scores (Augment 51.80%, Codex 56.8%) are not comparable to standardized results.

Benchmark caveat2026-03

METR: ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers

METR's March 10, 2026 study found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers (278 HN pts). Maintainer merge rates are ~24pp lower than automated grading. SWE-bench Pro is necessary but no longer sufficient as a sole authority.

Usage signal2026-03

SemiAnalysis: Claude Code at ~4% of GitHub public commits, $2.5B annualized revenue

~4% of public GitHub commits (~135K/day, SemiAnalysis est.), projected 20%+ by EOY 2026. 42,896x growth in 13 months. $2.5B annualized revenue (fastest enterprise SaaS to $1B ARR — Constellation Research). 8M+ npm weekly downloads. The hardest real-usage metric in the category.

Hybrid pattern2026-03

Emerging consensus: Claude Code for planning, Codex CLI for implementation

Multiple independent sources (Calvin French-Owen, Pawel Jozefiak, Blake Crosley) converge on using Claude Code for planning/architecture and Codex CLI for implementation. Not a compromise — may be the optimal workflow.

Quality monitoring2026-03

MarginLab: no Claude Code degradation detected (p<0.05)

Independent daily monitoring with 56% baseline pass rate and no statistically significant degradation. No other tool has this level of external quality assurance.

Revenue milestone2026-03

Constellation Research: Claude Code is fastest enterprise SaaS to $1B ARR

$2.5B annualized revenue, 500+ customers at $1M+/year. Fastest enterprise SaaS to $1B ARR in history (Constellation Research). 7.88M npm weekly downloads — 3x Codex (2.49M), 11.6x Gemini (678K).

Ranking update2026-03

2026-03-19 re-rank: Codex CLI promoted to #2 (2.49M npm/week), Aider drops to #7

Codex CLI at 2.49M npm/week is the clear #2 by active-use downloads — 13x Aider's 191K PyPI/week. Aider remains the strongest verifiable non-npm metric but download gap is too large to sustain #2 position. Aider moves to #7.

New entrant2026-03

Cline promoted to #4 — 3.35M VS Code installs, $32M funding, Salesforce/Samsung/SAP

Cline dominates the IDE-embedded-agent segment: 3.35M VS Code installs (5M across editors), $32M from Emergence Capital, named enterprise customers. Moves from Watch tier to #4. Supply chain incident (v2.3.0 'OpenClaw') remains a documented trust flag.

New entrant2026-03

OpenCode active again — v1.2.27 (2026-03-16), OpenAI partnership, 393K npm/week

OpenCode resumed active development after a gap. v1.2.27 released 2026-03-16. OpenAI official partnership after Anthropic blocking incident. 124,766 stars (largest AI coding repo). 393K npm downloads/week. RCE fixed in v1.1.10+. Moves from archived to #5.

New entry2026-03

RooCode added at #6 — 1.37M VS Code installs, 5.0/5 rating (highest in IDE segment)

RooCode added as a critical catalog gap. 5.0/5 VS Code rating on 1,372,346 installs is the strongest quality signal in the IDE-embedded segment. Cline fork with enterprise governance focus. v3.51.1 (2026-03-08). #6 in updated ranking.

Trust flag2026-02

Claude Code 'being dumbed down?' — 1,085 HN pts, 702 comments (live trust issue)

High-engagement HN thread questioning Claude 4.5 quality regression. MarginLab independent monitoring shows no statistical degradation (p<0.05), but community trust perception is a real cost. The 'dumbing down' narrative is now the single most-cited concern in the Claude Code user community.

Catalog correction2026-03

Amp GitHub 404 — corporate spin-out to Amp Inc., update all links to ampcode.com

sourcegraph/amp GitHub repo returns 404. Amp spun out from Sourcegraph as independent 'Amp Inc.' Tool still ships (36K npm downloads/week) under ampcode.com. All catalog links updated. Corporate restructure is a material change — verify before recommending.

Archived2026-03

SWE-agent down-ranked to academic/research — no release in 10 months (v1.1.0, 2025-05-22)

SWE-agent last released v1.1.0 on 2025-05-22 — 10 months stale. All active tools in the category are releasing weekly. Strong academic credential (Princeton, original SWE-bench paper) but not a production tool. Down-ranked to research/academic reference at #12.

What changes this

Auggie CLI public GA release + independent SWE-bench Pro reproduction → could move to Tier 1 if the 51.80% scaffold advantage holds outside Augment's own benchmark setup.

Gemini CLI publishing a credible SEAL SWE-bench Pro number → could move to #1 or #2 depending on result; currently ranked on traction alone.

Junie CLI post-beta community evidence → JetBrains' 11M+ installed base is large enough that strong first 60 days of public reception would immediately justify a Tier 2 slot.

Cline publishing a credible third-party security audit → would restore trust score and move it back into active Tier 2 consideration.

Aider publishing a SWE-bench Pro standardized number → would likely lock in #2 slot; currently its install verifiability is the strongest non-Anthropic signal in the category.

OpenCode resuming active development and patching the RCE → minimum bar to re-enter the ranking.

Claude Code quality regression persisting (the 'dumbed down' thread had 1,085 pts / 702 comments) → if perception hardens into documented capability regression, Tier 1 position is at risk.

If Gemini CLI fixes the file deletion pattern and files a clean safety record for 3+ months, its free tier + 1M context makes it a serious #2 contender.

If Codex CLI closes the SWE-bench standardized gap while maintaining cost/speed advantages, the #3/#4 ordering could shift.