skillpack.co
All problems

Research

Deep research agents, academic tools, and research infrastructure. The category has split: platform deep research (Perplexity, OpenAI, Google), open-source agents (GPT Researcher, Tongyi, STORM), academic specialists (Elicit, Consensus), and infrastructure (Tavily, Firecrawl). Speed, citation quality, and self-hosting are the real differentiators.

12

Ranked

9

Signals

Current ranking

1
Perplexity Deep Research43

Best for: Speed-sensitive research, daily use, citation-critical work

Fastest (15-30s vs minutes). 93.9% SimpleQA accuracy (Helicone). 50+ sources/report. Highest citation reliability in independent tests. 368 HN pts. Reddit r/PhD endorsement. $20/mo.

2
OpenAI Deep Research42

Best for: Expert-level reasoning, complex multi-step research, enterprise MCP workflows

26.6% HLE (highest of any system). 72.57% GAIA (best reported). MCP support (Feb 2026). API available. Domain-restricted searches.

3
Google NotebookLM + Deep Research42

Best for: Multimodal research, long-document analysis, research-to-presentation pipelines

100+ sources/query (highest coverage). 1M token context. Gemini 3 engine. Multimodal (video/audio/PDF). 907 HN pts (highest in category). $20/mo.

4
GPT Researcher89

Best for: Self-hosted research pipelines, agent integration, customizable workflows

CMU DeepResearchGym #1 (citation quality, report quality, info coverage — beat Perplexity and OpenAI). 25.8K stars. 15.9K weekly PyPI downloads. MCP server. Apache 2.0.

5
Tongyi DeepResearch82

Best for: Privacy-first deep research, local deployment, cost-sensitive teams

HLE 32.9 (exceeds OpenAI's 26.6). GAIA 70.9. Runs locally on consumer hardware (3.3B active params via MoE). 18.5K stars. 365 HN pts. Apache 2.0.

6
STORM (Stanford)66

Best for: Structured knowledge curation, article-length synthesis, academic workflows

28K stars. 84.8% citation recall / 85.2% precision (peer-reviewed). Wikipedia-style article generation. Co-STORM collaborative mode (EMNLP 2024). MIT license.

7
Claude Research38

Best for: Long-form synthesis, custom research agent development

Multi-agent architecture (Opus 4 lead + Sonnet 4 subagents). 90.2% improvement over single-agent (methodology published). 1M context. Agent SDK for custom pipelines.

8
Grok DeepSearch / DeeperSearch35

Best for: Breaking news orientation, social sentiment research, trend analysis

Only deep research tool with native real-time X/Twitter integration. 4-agent parallel system. Journalists use for breaking story orientation.

9
Elicit40

Best for: Systematic reviews, meta-analyses, clinical research, data extraction

138M papers + 545K clinical trials. API (Mar 2026). Claude Opus 4.5 extraction. Reddit-endorsed as 'essential for meta-analyses.' Sentence-level citations. $10/mo.

10
Consensus38

Best for: Quick claim-level evidence checking — 'does the literature support X?'

200M+ papers (via Semantic Scholar). Unique consensus meter showing scientific agreement. 1,000+ papers/query with Deep Search. 126 HN pts.

Below the cut line
11
Tavily40

Best for: Building research agents — de facto standard search backend

Default search backend for GPT Researcher, CrewAI, LangChain. Acquired by Nebius (Feb 2026). /research endpoint GA. MCP server. Domain governance.

Search & News #5
12
Firecrawl MCP Server72

Best for: Web data extraction for research pipelines

~40K stars. Web scraping → clean markdown for LLMs. MCP server. Research agent mode. AGPL-3.0.

Product / Business Development #1Search & News #2

Head to head

Perplexity Deep ResearchvsOpenAI Deep Research

Perplexity wins on speed (15-30s vs 3-15 min), citation quality, SimpleQA accuracy (93.9%), and price ($20 vs $200). OpenAI wins on expert reasoning (HLE 26.6% vs 21.1%) and GAIA (72.57%). Use case dependent: speed vs depth.

Google NotebookLMvsPerplexity Deep Research

Google leads on source coverage (100+ vs 10-30), multimodal (video/audio/PDF), and HN buzz (907 vs 368 pts). Perplexity leads on speed and citation quality. NotebookLM is a research workbench; Perplexity is a fast answer engine.

GPT ResearchervsTongyi DeepResearch

GPT Researcher has proven CMU benchmark validation and real PyPI adoption (15.9K/wk). Tongyi has superior benchmark scores (HLE 32.9 vs N/A) and runs locally. GPT Researcher = established leader; Tongyi = disruptive newcomer.

STORM (Stanford)vsGPT Researcher

STORM produces better structured Wikipedia-style articles with peer-reviewed citation quality (85.2% precision). GPT Researcher is more practical for agentic research workflows with MCP integration. STORM = knowledge artifacts; GPT Researcher = research pipelines.

ElicitvsConsensus

Elicit wins for deep academic workflows (systematic reviews, data extraction, API). Consensus wins for quick claim-level evidence checking with unique consensus meter. Complementary rather than competitive.

Public signals

What changes this

Independent benchmark for Claude Research — if it matches OpenAI on HLE/GAIA with third-party validation, moves to #2-3

Tongyi DeepResearch adoption data — if real PyPI/usage stats emerge, could leapfrog GPT Researcher to #4

OpenAI drops Deep Research price — if available at $20 unlimited, threatens Perplexity's #1

Perplexity ships MCP — would close the enterprise gap with OpenAI and solidify #1 across all use cases

NotebookLM ships API — would make Google a serious infrastructure play competing with Tavily

Fresh CMU benchmarks against Tongyi and updated OpenAI — would clarify open-source leader question

Consensus Deep Search gains traction — 11 HN pts is concerning; if adoption stays flat, risks becoming a footnote

Karpathy AutoResearch gets 'knowledge research' mode — would immediately enter main ranking, likely top 5 given 43K stars