All Problems

23 problem spaces, each with ranked solutions and public evidence. A problem is the narrow thing the agent needs to solve.

Problems

Coding CLIs / Code Agents

The hottest category right now. Ten+ serious CLI agents competing across three tiers. SWE-bench Pro (standardized) is necessary but no longer sufficient — METR found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers. Rankings weight benchmarks alongside practical tests, adoption, safety, and independent evaluations.

Ranking

01Claude Code—Complex multi-file refactors, framework migrations, architecture — any task where first-pass quality matters most

02Gemini CLI—Budget-conscious developers, students, exploratory/prototyping work, massive context window tasks

03GitHub Copilot CLI—Teams already on GitHub Copilot, enterprise environments requiring governance and audit trails

Open full report →

Web Browsing / Browser Automation

The category has split into four lanes: full-autonomy agents (Browser Use, Skyvern), MCP/CLI tools for coding agents (Chrome DevTools MCP, Playwright MCP/CLI, Vercel Agent Browser), frameworks/SDKs for building products (Stagehand), and consumer agentic browsers (BrowserOS). CLI-over-MCP is settled consensus (13+ independent sources). Browser Use hits 1M+ weekly PyPI downloads — unchallenged in Lane 1. BrowserOS just crossed 10K stars as the Lane 4 leader.

Ranking

01Browser Use—Full autonomous web browsing where the LLM needs complete control over unpredictable workflows

02Chrome DevTools MCP—Lane 2 leader for Chrome debugging workflows — fastest star growth in category

03Playwright MCP / CLI—Cross-browser automation with token-efficient CLI companion — owns both sides of the CLI-over-MCP divide

Open full report →

Product / Business Development

Eight distinct lanes confirmed by independent traction data: Research/Extraction (Firecrawl #1, Exa #2), Enterprise Operating Surface (mcp-atlassian #1, Rovo #2), Startup Operating Surface (Notion), CRM (HubSpot #1, Salesforce #2, Dynamics 365 #3 NEW), Business Automation (Zapier #1, n8n watch), Product Analytics (PostHog #1, Amplitude #2 NEW, Mixpanel watch), Project/PM (Linear #1, Monday watch, Asana watch), and Communication (Slack, real but early).

Ranking

01Firecrawl MCP Server—Structured web extraction, competitive research, lead enrichment, content ingestion into agent pipelines

02MCP Atlassian—Enterprise product teams on Jira + Confluence — sprint planning, issue tracking, documentation

03Notion MCP Server—Startup and cross-functional product teams using Notion — specs, roadmaps, wikis, databases

Open full report →

Teams of Agents / Multi-Agent Orchestration

Five distinct segments with almost no cross-over: (1) Python agent frameworks — build multi-agent systems in code (LangGraph #1, OpenAI Agents SDK #2, Pydantic AI #3, CrewAI #4, plus cloud-native: Strands/AWS, ADK/GCP, Semantic Kernel/Azure); (2) TypeScript framework — Mastra (no competitor); (3) Autonomous coding agents — delegate software development to an agent (OpenHands, Factory AI); (4) Parallel agent IDEs — run multiple coding agents simultaneously (Emdash, ccpm, Superset); (5) Workflow automation — orchestrate integrations visually (n8n, Sim Studio). Ranking all on a single list is misleading — each serves a different buyer.

Ranking

01OpenHands—End-to-end autonomous coding platform — self-hostable, model-agnostic, enterprise-validated

02Emdash (YC W26)—Multi-agent orchestration with Best-of-N comparison and issue-tracker integration (Linear, Jira, GitHub Issues)

03ccpm—Shell-based parallel agent execution using GitHub Issues + git worktrees — pragmatic, no unnecessary complexity

Open full report →

UX / UI

Five lanes: (A) read-only Figma context (Official #1, Framelink #2), (B) bidirectional write-access (Console MCP #1, Grab #2, figma-use #3), (C) alternative platforms (Penpot, Excalidraw), (D) specialized design-to-code agents (Kombai — 75–80% fidelity), (E) AI-native design creation (Google Stitch — Figma stock -8.8%, Onlook). Uber uSpec remains strongest enterprise validation. Google Stitch is provisional (2 days old).

Ranking

01Figma MCP Server Guide—Teams with Figma Professional/Enterprise and Code Connect configured — the 'batteries included' choice

02Framelink / Figma-Context-MCP—Individual developers, free-tier Figma users, and teams with custom codebases where descriptive metadata is more useful than prescriptive code

03Figma Console MCP—Design system teams managing variables, tokens, specs, and multi-platform documentation at scale

Open full report →

Software Factories

Autonomous coding agents that plan, write, test, and ship code with minimal human oversight. Claude Code leads on benchmarks, community signal, and platform distribution (Apple Xcode). Cursor leads on revenue and event-driven automation. Gemini CLI is the free-tier disruptor. The category has split into CLI-first (Claude Code, Codex CLI, Gemini CLI), IDE-integrated (Cursor, Copilot, Cline), open-source (OpenHands), and enterprise-managed (Augment, Factory). SWE-bench Verified is dead — Pro is the new standard.

Ranking

01Claude Code (Anthropic)—Developers who want the most capable coding agent — complex multi-file refactors, greenfield projects, long-running autonomous execution

02Cursor (Anysphere)—Teams that want an all-in-one IDE with autonomous background agents AND event-driven automation (Slack → agent writes PR → human reviews)

03Gemini CLI (Google)—Budget-conscious developers, Google Cloud/Android shops, and anyone wanting a capable free coding CLI

Open full report →

Search & News

Web search, scraping, and deep research tools for AI agents. The category has split into three lanes: search APIs (Brave, Exa, Tavily), scrape/crawl tools (Firecrawl, Crawl4AI), and deep research APIs (Parallel, Perplexity Sonar). Most serious agent workflows need tools from the first two lanes. MCP support is table stakes — the real differentiators are benchmark quality, latency, index independence, and license. Deep research lane is still immature — WideSearch academic benchmark shows near 0% success on broad tasks.

Ranking

01Brave Search API—Default search API for AI agents — fastest, broadest MCP tooling, independent benchmark winner

02Firecrawl—Web scraping, structured extraction, turning messy pages into LLM-ready content — the research/extraction workhorse

03Exa—Semantic search, similarity search, market mapping — strongest where traditional keyword search fails

Open full report →

Marketing

Skills for SEO, content optimization, ad copy, social media calendars, competitor analysis, and growth automation.

Ranking

01Jasper AI—Brand-governed marketing teams of 3-10 people

02HubSpot Breeze AI—Teams already on HubSpot Professional/Enterprise

03Copy.ai—Marketing workflow automation without enterprise pricing

Open full report →

Business

Skills for pitch decks, financial modeling, contract review, OKR frameworks, invoicing, and business operations.

Ranking

#1Gamma—AI pitch decks (default choice)

#2Spellbook—AI contract review (law firms)

#3LinkSquares—Enterprise legal ops / CLM

Open full report →

Content & Writing

Skills for blog posts, newsletters, technical writing, style guide enforcement, and editorial workflows.

Ranking

01Vale—Docs-as-code style enforcement in CI/CD pipelines

02Harper—Privacy-first, on-device grammar checking for developers

03Copy.ai—Marketing content workflow automation at accessible pricing

Open full report →

Research

Deep research agents, academic tools, and research infrastructure. The category has split: platform deep research (Perplexity, OpenAI, Google), open-source agents (GPT Researcher, Tongyi, STORM), academic specialists (Elicit, Consensus), and infrastructure (Tavily, Firecrawl). Speed, citation quality, and self-hosting are the real differentiators.

Ranking

01Perplexity Deep Research—Speed-sensitive research, daily use, citation-critical work

02OpenAI Deep Research—Expert-level reasoning, complex multi-step research, enterprise MCP workflows

03Google NotebookLM + Deep Research—Multimodal research, long-document analysis, research-to-presentation pipelines

Open full report →

Automation

Three sub-categories with distinct buyers: Workflow Automation (n8n #1, Activepieces, Zapier), Code-First Orchestration (Windmill, Trigger.dev, Inngest, Kestra), Agent Integration (Composio, Pipedream MCP). n8n dominates overall with 180K stars, n8n-mcp (15.4K stars), and dedicated Claude Code skills. Inngest leads npm downloads (499K/wk). Composio is provisional — zero HN traction despite 27K stars.

Ranking

01n8n—Technical teams building complex, multi-step automations with AI agents — the default choice for Claude Code users needing workflow automation

02Composio (provisional)—AI agent developers needing managed auth across dozens of services — use inside n8n or alongside other workflow tools

03Kestra—Data engineering and DevOps teams migrating from Airflow — YAML-declarative workflows with Git version control

Open full report →

Security

Skills for SAST scanning, secret detection, agent/MCP security scanning, and offensive security. The category splits into four sub-themes: SAST/code scanning (Semgrep MCP #1), secret detection (GitGuardian MCP #1), agent/MCP security scanning (Snyk Agent Scan #1), and offensive security (HexStrike AI #1). Agent security scanning is the fastest-growing sub-theme — these tools scan your agents, skills, and MCP servers, not your application code.

Ranking

01Semgrep MCP—OSS SAST scanning with official MCP integration — the default recommendation for code security in agent workflows

02Snyk Agent Scan—Enterprise agent/MCP security scanning — scans your agents, skills, and MCP servers for prompt injection, tool poisoning, toxic flows

03GitGuardian MCP (ggmcp)—Purpose-built secret scanning for agent workflows — 500+ detectors with hard merge gates

Open full report →

Documentation

Docs-as-code frameworks, API documentation generators, documentation SaaS platforms, and documentation automation tools. The category splits into four overlapping lanes: OSS docs frameworks (Fumadocs #1, Starlight #2, Docusaurus #3), API docs (Fern #4, Redocly #6, Swagger UI #8), SaaS platforms (Mintlify #5, GitBook #10), and automation (Promptless #9). Fumadocs is the momentum winner — 3x YoY growth and 5-open-issues maintenance make it the clear pick for Next.js teams.

Ranking

01Fumadocs—Next.js teams wanting the fastest-growing, best-maintained docs framework with built-in OpenAPI rendering

02Astro Starlight—Content-first static docs with zero client-side JS — the DX leader for non-Next.js stacks

03Docusaurus—Large OSS projects needing versioned docs and a massive plugin ecosystem — the safe default

Open full report →

Data & Analytics

AI-powered data analysis tools, reactive notebooks, BI-as-code platforms, conversational data agents, and ML training aids. The category has split into reactive notebooks (Marimo), AI visualization (Data Formulator), BI-as-code (Evidence, Observable), app deployment (Streamlit), conversational data (PandasAI), and prompt-to-ML (Plexe). Marimo is the clear #1 — strongest combined signal across stars, downloads, HN attention, and independent validation.

Ranking

01Marimo—Exploratory analysis with reproducibility guarantees — the Jupyter replacement for agent-assisted data workflows

02Data Formulator—AI-powered iterative data visualization — describe what you want, the agent builds and refines the chart

03Evidence—SQL-first analysts who want version-controlled, code-authored BI reports — no JS/Python required

Open full report →

Personal Assistants

How do I interact with AI for everyday tasks? ChatGPT, Claude, Gemini, and emerging contenders like OpenClaw.

Ranking

Open full report →

Memory Systems

How do agents remember context across sessions? Vector DBs, context management, and persistent memory for AI workflows.

Ranking

Open full report →

Performance

How do I profile, benchmark, and optimize AI/agent workloads? Speed and efficiency tools for production deployments.

Ranking

Open full report →

Analytics & LLM Tracing

How do I observe and trace LLM calls and agent runs? PostHog, Braintrust, LangSmith, Helicone, and more.

Ranking

Open full report →

Web Development & UI Frameworks

How do I build AI-powered UIs? Frontend frameworks and tools for shipping AI products — Vercel AI SDK, Streamlit, v0, Bolt, Lovable.

Ranking

Open full report →

Agent Harnesses

How do I orchestrate and run agents? Frameworks for building, deploying, and managing AI agent workflows — LangChain, CrewAI, Pydantic AI, Claude Agent SDK.

Ranking

Open full report →

Knowledge Management

How do I organize and retrieve team knowledge? Notion, Google Workspace, Obsidian — the foundations MCP tools build on.

Ranking

Open full report →

AI Adoption & Best Practices

How do I adopt AI effectively? Meta-tracking, best practices, ecosystem navigation. SkillPack itself lives in this problem space.

Ranking

Open full report →