skillpack.co
All categories

Coding CLIs / Code Agents

The hottest category right now. Ten+ serious CLI agents competing across three tiers. SWE-bench Pro (standardized) is necessary but no longer sufficient — METR found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers. Rankings weight benchmarks alongside practical tests, adoption, safety, and independent evaluations.

22

Ranked

21

Signals

Verdict

Claude Code is #1 — 7.88M npm downloads/week (3x nearest rival), 79K stars, ~4% of GitHub public commits (SemiAnalysis, Feb 2026). Leads SWE-bench Pro standardized (45.89%, SEAL #1). $2.5B annualized revenue. Quality regression perception is a live trust issue ('dumbed down?' — 1,085 HN pts, Feb 2026) but MarginLab monitoring shows no statistical degradation.

Codex CLI is #2 — 2.49M npm downloads/week (clear #2 by active use). Rust rewrite eliminates Node.js dependency — unique in category. GPT-5.3-Codex leads SWE-bench Pro custom scaffold at 56.8% (non-standardized). Terminal-Bench 77.3%. Best for locked-down environments or OpenAI model loyalists.

Gemini CLI is #3 — 98K stars (highest raw count), best free tier (1K req/day, no credit card), 1M context window, 678K npm downloads/week. File deletion incident (AI Incident DB #1178) is a visible trust flag. Not recommended for unattended agentic use without a human review step.

Cline (cline.bot) is #4 — 3.35M VS Code installs (5M across editors), $32M funding (Emergence Capital), named enterprise customers (Salesforce, Samsung, SAP). Supply chain incident (v2.3.0 'OpenClaw') is a documented trust flag — would move to Tier 1/2 with a credible security audit.

OpenCode is #5 — 124K stars (largest AI coding repo), v1.2.27 active (2026-03-16), OpenAI official partnership. 393K npm downloads/week. RCE fixed in v1.1.10+. Trust story is messier than peers due to corporate conflict + security history.

RooCode is #6 — 1.37M VS Code installs, 5.0/5 VS Code rating (highest quality signal in IDE-agent segment). Cline fork with enterprise governance focus. v3.51.1 (2026-03-08). Best for teams wanting Cline-style agentic coding with stricter governance.

Aider is #7 — 191K PyPI/week, 5.7M lifetime installs. Multi-model, git-native, no vendor lock-in. Category pressure growing: HN thread 'stopped using Aider in favor of Claude Code' (#44154020). Best for Python devs who want fine-grained model control. v0.86.2 (2026-02-12).

The deeper read

SWE-bench Pro (standardized) is necessary but no longer sufficient. METR's March 10, 2026 study found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers (278 HN pts). Maintainer merge rates are ~24pp lower than automated grading. Rankings weight SWE-bench alongside practical tests, adoption, safety, and independent evaluations.

Verifiable traction is the new tie-breaker. Aider's 191,828/week PyPI installs are a public artifact — harder to game than star counts or social media engagement. Rankings now weight independently verifiable usage metrics more heavily.

Each major model provider now has a CLI agent (Anthropic → Claude Code, OpenAI → Codex CLI, Google → Gemini CLI). The emerging consensus is a hybrid pattern: Claude Code for planning/architecture, Codex CLI for implementation. Multi-model tools (Aider, Crush, Goose, Qwen Code) offer a third lane: model-agnostic with no vendor dependency.

Current ranking

01
Claude CodeOfficial80K+ 121%90

Best for: Architecture, planning, complex reasoning, security analysis, niche languages

7.88M npm downloads/week — 3x nearest rival. ~4% of GitHub public commits (SemiAnalysis). #1 SWE-bench Pro standardized (45.89%, SEAL). $2.5B annualized revenue (fastest enterprise SaaS to $1B ARR). HN peak 2,127 pts — unmatched community mindshare.

Quality regression perception: 'Claude Code is being dumbed down?' (1,085 HN pts, Feb 2026) is a live trust issue. Rate limits are the #1 complaint. 3-4x higher token consumption per task than Codex CLI.

02
Codex CLIOfficial66K+ 8%85

Best for: OpenAI ecosystem, locked-down environments, token efficiency, sandbox-first safety

2.49M npm downloads/week — clear #2 by active use. Rust rewrite eliminates Node.js dependency — unique in category. Terminal-Bench 77.3% (GPT-5.3-Codex). 3-4x more token-efficient than Claude Code. Free with ChatGPT subscription.

SWE-bench Pro standardized 41.04% — trails Claude Code by ~5pp. Tied to OpenAI models only. Custom scaffold score (56.8%) is not standardized.

03
Gemini CLIOfficial98K+ 15%86

Best for: Budget-constrained developers, large-context tasks, free entry point

Best free tier in category: 1K req/day, no credit card. 1M native context — largest. 98K stars (highest raw count). 678K npm downloads/week. Google-backed, open source (Apache 2.0).

File deletion incident (AI Incident DB #1178) is a visible trust flag. 11.6x download gap vs Claude Code despite higher star count — brand-driven stars. Not recommended for unattended agentic use without human review.

04

Best for: VS Code developers, enterprise teams with governance requirements

3.35M VS Code installs (5M across editors). $32M raise (Emergence Capital). Named enterprise customers: Salesforce, Samsung, SAP. v3.73.0 released 2026-03-16. Dominates the IDE-embedded-agent segment.

Supply chain incident: v2.3.0 'OpenClaw' compromise — no third-party security audit published. Primarily a VS Code extension; CLI surface is secondary. Would move to Tier 1 with a credible security audit.

05
OpenCode125K+ 83

Best for: Maximum model flexibility, open-source-first teams, OpenAI ecosystem

124,766 stars — largest AI coding repo by raw count. 393K npm downloads/week. v1.2.27 active (2026-03-16). OpenAI official partnership. RCE fixed in v1.1.10+. 75+ model providers.

Trust story is messy: RCE disclosure (432 HN pts) + Anthropic blocking incident (625 HN pts). Star count inflated by controversy. No published benchmark scores.

06
RooCode22K+ 53

Best for: Teams wanting Cline-style agentic coding with stricter governance and multi-model flexibility

5.0/5 VS Code rating on 1,372,346 installs — strongest quality signal in the IDE-agent segment. Cline fork inherits proven codebase while adding enterprise governance. v3.51.1 (2026-03-08).

Fork positioning — unclear differentiation beyond governance vs upstream Cline. No published benchmarks. Smaller enterprise validation than Cline.

07
Aider42K+ 11%91

Best for: Python developers, maximum model flexibility, git-native workflow, token efficiency

191,828 PyPI/week, 5.7M lifetime installs — most independently verifiable usage outside Claude Code. Multi-model (any OpenAI-compatible API), git-native, no vendor lock-in. v0.86.2 released 2026-02-12.

Category pressure: HN 'stopped using Aider in favor of Claude Code' (#44154020). Codex CLI at 2.49M and Gemini at 678K npm/week have overtaken Aider's download rank. Shipping cadence behind daily-release competitors.

Below the cut line
08
Junie CLI (JetBrains)OfficialN/A 28

Best for: JetBrains loyalists wanting BYOK pricing with institutional IDE vendor support

JetBrains distribution: 14M existing user base. BYOK pricing. Explicit one-click migration from Claude Code. LLM-agnostic. Most strategically significant new entrant — revisit in 60 days.

Beta only — launched 2026-03-09. No public repo. No benchmark scores. No independent reviews yet.

09
Goose (Block)33K+ 68

Best for: Enterprise open governance, provider-agnostic agentic workflows, Apache 2.0 licensing

33K stars, v1.28.0 released 2026-03-18. Linux Foundation AAIF founding member. Provider-agnostic, MCP reference implementation. Block institutional backing.

No published benchmarks. No download data — cannot assess active-use gap vs top-tier tools. 'Super jank' reputation in HN comments.

10

Best for: Background agents enforcing code quality on PRs

2,372,585 VS Code installs (second-highest in IDE segment). 31,935 stars. Last release v1.2.17 (2026-03-13). Pivoted to async CI agents for PR enforcement.

Category shifting under it — more AI coding assistant framework than agentic CLI. Low HN engagement (44 pts) relative to VS Code install count.

11
Amp (Amp Inc.)N/A 28

Best for: Teams with large, complex codebases needing deep code intelligence

Most sophisticated sub-agent architecture (Oracle, Librarian, Painter). Sourcegraph code intelligence DNA. 36K npm weekly downloads. Free tier + BYOK.

Corporate spin-out: sourcegraph/amp GitHub returns 404, now independent Amp Inc. Update catalog entry. No SWE-bench benchmark, 0 HN pts in tracked period. Priority: verify current state before recommending.

12

Best for: Benchmark research, academic reference, issue-level repair evaluation

18,777 stars. Princeton NLP origin. SWE-agent scaffold: 79.2% SWE-bench Verified on Opus 4.5 — original SWE-bench paper. Strong academic credential.

Last release v1.1.0: 2025-05-22 — 10 months stale. Not a production tool. Down-ranked to academic/research reference.

13

Best for: Terminal-first developers wanting polished UX, multi-platform support

Best terminal UX in the category. Charmbracelet proven track record (Bubble Tea, 25K+ apps). Multi-model, LSP, MCP, cross-platform. 21K stars, v0.50.1 (2026-03-17). HN: 367 pts.

No published benchmark scores. Custom license (not standard OSS). Insufficient evidence of production coding-agent use to rank above Tier 4 — revisit if download data surfaces.

14
OpenHands (All Hands AI)69K+ 12%88

Best for: Web UI-based coding agent, research teams

69,352 stars. Last release 1.5.0 (2026-03-11). Active development.

Primary interface is web UI, not CLI — may belong in a separate web-agent category. Missing download signal. Low HN traction (70 pts) relative to star count suggests research-primary audience.

15
Qwen CodeOfficial21K+ 68

Best for: Cost=zero open-weight model, local/on-prem deployment

Qwen3-Coder-Next: 70.6% SWE-bench Verified (highest open-weight model). 1K free daily requests. 20K stars, v0.12.6 released 2026-03-17.

Alibaba/Chinese cloud provenance — enterprise and GovCloud scrutiny required. SWE-bench Pro standardized 38.70% lowest among ranked tools. Near-zero HN engagement (~7 pts).

16
Auggie CLI (Augment)Official153 43

Best for: Teams wanting highest raw benchmark number, semantic codebase indexing

51.80% SWE-bench Pro on Augment scaffold — highest raw number in category. Augment Context Engine provides deep semantic codebase understanding.

No public release — 153 GitHub stars. Benchmark uses non-standardized Augment scaffold. Single blog post is the only public artifact.

17
Kimi Code (Moonshot AI)Official7K+ 38

Best for: Chinese developer ecosystem, teams using Moonshot AI models

7.2K stars, 124K PyPI weekly downloads. K2.5 model (HN: 388 pts). Moonshot AI $1B+ funding.

Western ecosystem integration limited. No SWE-bench Pro or Terminal-Bench scores published.

18

Best for: Privacy-conscious teams, OpenRouter users

16.8K stars, 131K npm weekly downloads. $8M seed funding. OpenRouter-native.

Early stage. Needs differentiation beyond OpenRouter integration.

19
GitHub Copilot CLIOfficial1.1K+ 10

Best for: Teams already on GitHub Copilot subscription needing a terminal companion

15M Copilot subscriber distribution. Multi-model (Opus 4.6, Sonnet 4.6, GPT-5.3-Codex, Gemini 3 Pro). Enterprise Agent Control Plane.

CVE-2026-29783 hit 2 days after GA — arbitrary code execution via shell expansion (PromptArmor). No published benchmark scores. Low community signal (24 HN pts, 9.4K stars).

20
CursorOfficialN/A 28

Best for: Polished commercial IDE with integrated AI

$29.3B valuation, most adopted commercial AI IDE. Strong UX, agent modes (Jan 2026).

IDE-first, CLI is secondary. Closed-source, paid, vendor-locked.

21
WarpOfficial26K+ 43

Best for: Terminal-first developers who want an integrated AI environment

26K+ stars, 75.8% SWE-bench Verified, TIME Best Inventions. Full terminal replacement.

Closed-source. 4,350 open issues. Category mismatch — more 'AI terminal' than coding CLI agent.

22
Kiro CLI (Amazon)OfficialN/A 38

Best for: Spec-driven development, AWS integration, GovCloud

Amazon-backed. GovCloud focus. Spec-driven development approach. CLI v1.27 (2026-03-02).

No public repo, no benchmark, no meaningful HN engagement. Insufficient evidence to rank at this time.

See the full comparison.

Stars, downloads, evidence — all skills side by side.

COMPARE →

Skills comparison

GitHub stars and evidence count for top ranked skills.

GitHub Stars

Evidence items

Strong
Moderate

+12 more not shown

Star growth over time

GitHub stars trajectory for top skills in this category.

GitHub Stars

Claude Code
Codex CLI
Gemini CLI

Head to head

Claude CodevsCodex CLI

Claude Code leads SWE-bench Pro standardized (45.89% vs 41.04%), Morph Tier 1 'deepest reasoning', Educative 1h17m single-shot. Codex CLI leads Terminal-Bench (77.3% GPT-5.3-Codex), 3-4x more token-efficient, 240+ tok/s. Emerging consensus: use both — Claude for planning, Codex for implementation.

Gemini CLIvsOpenCode

Gemini CLI has independent SWE-bench Verified scores (76.2%), 1M native context, and the best free tier. OpenCode has more stars (123K vs 98K) and model flexibility (75+ providers). But Gemini has proven benchmarks while OpenCode has none — that's the gap.

Gemini CLIvsCodex CLI

Gemini CLI: free tier (1K req/day), 1M context, 98K stars, Deep Think mode. Codex CLI: Terminal-Bench 77.3% (GPT-5.3-Codex), sandbox-first safety, free with ChatGPT. Gemini wins on cost and context; Codex wins on proven terminal performance and speed.

AidervsOpenCode

Both are model-agnostic, but Aider hasn't shipped a release in 7 months while OpenCode ships daily. Aider has verifiable PyPI downloads (183K/week); OpenCode's 5M MAD claim is unverified. Aider's token efficiency (4.2x less than Claude Code) is unmatched.

Gemini CLIvsClaude Code

Gemini CLI now at 43.30% SWE-bench Pro standardized vs Claude Code's 45.89% — gap narrowed to 2.59pp. Gemini wins overwhelmingly on cost (free 1K req/day) and context (1M native). Claude Code wins on adoption (8M vs 647K npm/wk), revenue ($2.5B ARR), and HN mindshare (2,127 vs 1,428 pts). Tool-calling weaknesses keep Gemini at #3.

AmpvsClaude Code

Amp has the most sophisticated sub-agent architecture (Oracle, Librarian, Painter) from Sourcegraph's code intelligence DNA. Claude Code has 58x more npm downloads (8M vs 139K), published benchmarks (SWE-bench Pro #1), and 24x more HN engagement. Amp is a bet on code intelligence depth; Claude Code is the proven all-rounder.

Missing a contender?

If there's a skill we haven't ranked, submit it.

SUBMIT A SKILL →

Public signals

Ranking updated2026-03
2026-03-18 re-rank: Aider promoted to #2 on verified install data

Aider moves from #8 to #2 based on verified 2026-03-18 data: 191,828/week PyPI installs, 5.7M lifetime installs, 15B tokens/week (homepage). The only tool in the category with a fully independent, verifiable download number outside of Claude Code. Multi-model, git-native, no vendor lock-in.

Benchmark verified2026-03
SEAL SWE-bench Pro leaderboard (verified 2026-03-18): Claude Code #1 at 45.89%

Verified 2026-03-18 against SEAL public leaderboard. Claude Code #1 (45.89%). Augment scaffold 51.80% is highest raw number but not on standardized leaderboard. Qwen3-Coder-Next: 44.3% on Qwen Code leaderboard (unconfirmed on SEAL). Gemini CLI and Codex CLI standardized numbers unconfirmed on this date.

Archived2026-03
OpenCode archived — last release 9 months stale, known RCE

OpenCode last released v0.0.55 on 2025-06-27 — 9 months stale at time of this ranking. Known unauthenticated RCE vulnerability (432 HN pts on disclosure). Removed from active ranking pending resumed development and security remediation.

What changes this

Auggie CLI public GA release + independent SWE-bench Pro reproduction → could move to Tier 1 if the 51.80% scaffold advantage holds outside Augment's own benchmark setup.

Gemini CLI publishing a credible SEAL SWE-bench Pro number → could move to #1 or #2 depending on result; currently ranked on traction alone.

Junie CLI post-beta community evidence → JetBrains' 11M+ installed base is large enough that strong first 60 days of public reception would immediately justify a Tier 2 slot.

Cline publishing a credible third-party security audit → would restore trust score and move it back into active Tier 2 consideration.

Aider publishing a SWE-bench Pro standardized number → would likely lock in #2 slot; currently its install verifiability is the strongest non-Anthropic signal in the category.

OpenCode resuming active development and patching the RCE → minimum bar to re-enter the ranking.

Claude Code quality regression persisting (the 'dumbed down' thread had 1,085 pts / 702 comments) → if perception hardens into documented capability regression, Tier 1 position is at risk.

If Gemini CLI fixes the file deletion pattern and files a clean safety record for 3+ months, its free tier + 1M context makes it a serious #2 contender.

If Codex CLI closes the SWE-bench standardized gap while maintaining cost/speed advantages, the #3/#4 ordering could shift.