All Problems
23 problem spaces, each with ranked solutions and public evidence. A problem is the narrow thing the agent needs to solve.
23
Problems
Coding CLIs / Code Agents
The hottest category right now. Ten+ serious CLI agents competing across three tiers. SWE-bench Pro (standardized) is necessary but no longer sufficient — METR found ~50% of SWE-bench-passing PRs would NOT be merged by real maintainers. Rankings weight benchmarks alongside practical tests, adoption, safety, and independent evaluations.
Ranking
Open full report →
Web Browsing / Browser Automation
The category has split into four lanes: full-autonomy agents (Browser Use, Skyvern), MCP/CLI tools for coding agents (Chrome DevTools MCP, Playwright MCP/CLI, Vercel Agent Browser), frameworks/SDKs for building products (Stagehand), and consumer agentic browsers (BrowserOS). CLI-over-MCP is settled consensus (13+ independent sources). Browser Use hits 1M+ weekly PyPI downloads — unchallenged in Lane 1. BrowserOS just crossed 10K stars as the Lane 4 leader.
Ranking
Open full report →
Product / Business Development
Eight distinct lanes confirmed by independent traction data: Research/Extraction (Firecrawl #1, Exa #2), Enterprise Operating Surface (mcp-atlassian #1, Rovo #2), Startup Operating Surface (Notion), CRM (HubSpot #1, Salesforce #2, Dynamics 365 #3 NEW), Business Automation (Zapier #1, n8n watch), Product Analytics (PostHog #1, Amplitude #2 NEW, Mixpanel watch), Project/PM (Linear #1, Monday watch, Asana watch), and Communication (Slack, real but early).
Ranking
Open full report →
Teams of Agents / Multi-Agent Orchestration
Five distinct segments with almost no cross-over: (1) Python agent frameworks — build multi-agent systems in code (LangGraph #1, OpenAI Agents SDK #2, Pydantic AI #3, CrewAI #4, plus cloud-native: Strands/AWS, ADK/GCP, Semantic Kernel/Azure); (2) TypeScript framework — Mastra (no competitor); (3) Autonomous coding agents — delegate software development to an agent (OpenHands, Factory AI); (4) Parallel agent IDEs — run multiple coding agents simultaneously (Emdash, ccpm, Superset); (5) Workflow automation — orchestrate integrations visually (n8n, Sim Studio). Ranking all on a single list is misleading — each serves a different buyer.
Ranking
Open full report →
UX / UI
Five lanes: (A) read-only Figma context (Official #1, Framelink #2), (B) bidirectional write-access (Console MCP #1, Grab #2, figma-use #3), (C) alternative platforms (Penpot, Excalidraw), (D) specialized design-to-code agents (Kombai — 75–80% fidelity), (E) AI-native design creation (Google Stitch — Figma stock -8.8%, Onlook). Uber uSpec remains strongest enterprise validation. Google Stitch is provisional (2 days old).
Ranking
Open full report →
Software Factories
Autonomous coding agents that plan, write, test, and ship code with minimal human oversight. Claude Code leads on benchmarks, community signal, and platform distribution (Apple Xcode). Cursor leads on revenue and event-driven automation. Gemini CLI is the free-tier disruptor. The category has split into CLI-first (Claude Code, Codex CLI, Gemini CLI), IDE-integrated (Cursor, Copilot, Cline), open-source (OpenHands), and enterprise-managed (Augment, Factory). SWE-bench Verified is dead — Pro is the new standard.
Ranking
Open full report →
Search & News
Web search, scraping, and deep research tools for AI agents. The category has split into three lanes: search APIs (Brave, Exa, Tavily), scrape/crawl tools (Firecrawl, Crawl4AI), and deep research APIs (Parallel, Perplexity Sonar). Most serious agent workflows need tools from the first two lanes. MCP support is table stakes — the real differentiators are benchmark quality, latency, index independence, and license. Deep research lane is still immature — WideSearch academic benchmark shows near 0% success on broad tasks.
Ranking
Open full report →
Marketing
Skills for SEO, content optimization, ad copy, social media calendars, competitor analysis, and growth automation.
Ranking
Open full report →
Business
Skills for pitch decks, financial modeling, contract review, OKR frameworks, invoicing, and business operations.
Ranking
Open full report →
Content & Writing
Skills for blog posts, newsletters, technical writing, style guide enforcement, and editorial workflows.
Ranking
Open full report →
Research
Deep research agents, academic tools, and research infrastructure. The category has split: platform deep research (Perplexity, OpenAI, Google), open-source agents (GPT Researcher, Tongyi, STORM), academic specialists (Elicit, Consensus), and infrastructure (Tavily, Firecrawl). Speed, citation quality, and self-hosting are the real differentiators.
Ranking
Open full report →
Automation
Three sub-categories with distinct buyers: Workflow Automation (n8n #1, Activepieces, Zapier), Code-First Orchestration (Windmill, Trigger.dev, Inngest, Kestra), Agent Integration (Composio, Pipedream MCP). n8n dominates overall with 180K stars, n8n-mcp (15.4K stars), and dedicated Claude Code skills. Inngest leads npm downloads (499K/wk). Composio is provisional — zero HN traction despite 27K stars.
Ranking
Open full report →
Security
Skills for SAST scanning, secret detection, agent/MCP security scanning, and offensive security. The category splits into four sub-themes: SAST/code scanning (Semgrep MCP #1), secret detection (GitGuardian MCP #1), agent/MCP security scanning (Snyk Agent Scan #1), and offensive security (HexStrike AI #1). Agent security scanning is the fastest-growing sub-theme — these tools scan your agents, skills, and MCP servers, not your application code.
Ranking
Open full report →
Documentation
Docs-as-code frameworks, API documentation generators, documentation SaaS platforms, and documentation automation tools. The category splits into four overlapping lanes: OSS docs frameworks (Fumadocs #1, Starlight #2, Docusaurus #3), API docs (Fern #4, Redocly #6, Swagger UI #8), SaaS platforms (Mintlify #5, GitBook #10), and automation (Promptless #9). Fumadocs is the momentum winner — 3x YoY growth and 5-open-issues maintenance make it the clear pick for Next.js teams.
Ranking
Open full report →
Data & Analytics
AI-powered data analysis tools, reactive notebooks, BI-as-code platforms, conversational data agents, and ML training aids. The category has split into reactive notebooks (Marimo), AI visualization (Data Formulator), BI-as-code (Evidence, Observable), app deployment (Streamlit), conversational data (PandasAI), and prompt-to-ML (Plexe). Marimo is the clear #1 — strongest combined signal across stars, downloads, HN attention, and independent validation.
Ranking
Open full report →
Personal Assistants
How do I interact with AI for everyday tasks? ChatGPT, Claude, Gemini, and emerging contenders like OpenClaw.
Ranking
Open full report →
Memory Systems
How do agents remember context across sessions? Vector DBs, context management, and persistent memory for AI workflows.
Ranking
Open full report →
Performance
How do I profile, benchmark, and optimize AI/agent workloads? Speed and efficiency tools for production deployments.
Ranking
Open full report →
Analytics & LLM Tracing
How do I observe and trace LLM calls and agent runs? PostHog, Braintrust, LangSmith, Helicone, and more.
Ranking
Open full report →
Web Development & UI Frameworks
How do I build AI-powered UIs? Frontend frameworks and tools for shipping AI products — Vercel AI SDK, Streamlit, v0, Bolt, Lovable.
Ranking
Open full report →
Agent Harnesses
How do I orchestrate and run agents? Frameworks for building, deploying, and managing AI agent workflows — LangChain, CrewAI, Pydantic AI, Claude Agent SDK.
Ranking
Open full report →
Knowledge Management
How do I organize and retrieve team knowledge? Notion, Google Workspace, Obsidian — the foundations MCP tools build on.
Ranking
Open full report →
AI Adoption & Best Practices
How do I adopt AI effectively? Meta-tracking, best practices, ecosystem navigation. SkillPack itself lives in this problem space.
Ranking
Open full report →