Research

Deep research agents, academic tools, and research infrastructure. The category has split: platform deep research (Perplexity, OpenAI, Google), open-source agents (GPT Researcher, Tongyi, STORM), academic specialists (Elicit, Consensus), and infrastructure (Tavily, Firecrawl). Speed, citation quality, and self-hosting are the real differentiators.

Ranked

Signals

Current ranking

Perplexity Deep Research43

Best for: Speed-sensitive research, daily use, citation-critical work

Fastest (15-30s vs minutes). 93.9% SimpleQA accuracy (Helicone). 50+ sources/report. Highest citation reliability in independent tests. 368 HN pts. Reddit r/PhD endorsement. $20/mo.

OpenAI Deep Research42

Best for: Expert-level reasoning, complex multi-step research, enterprise MCP workflows

26.6% HLE (highest of any system). 72.57% GAIA (best reported). MCP support (Feb 2026). API available. Domain-restricted searches.

Google NotebookLM + Deep Research42

Best for: Multimodal research, long-document analysis, research-to-presentation pipelines

100+ sources/query (highest coverage). 1M token context. Gemini 3 engine. Multimodal (video/audio/PDF). 907 HN pts (highest in category). $20/mo.

GPT Researcher89

Best for: Self-hosted research pipelines, agent integration, customizable workflows

CMU DeepResearchGym #1 (citation quality, report quality, info coverage — beat Perplexity and OpenAI). 25.8K stars. 15.9K weekly PyPI downloads. MCP server. Apache 2.0.

Tongyi DeepResearch82

Best for: Privacy-first deep research, local deployment, cost-sensitive teams

HLE 32.9 (exceeds OpenAI's 26.6). GAIA 70.9. Runs locally on consumer hardware (3.3B active params via MoE). 18.5K stars. 365 HN pts. Apache 2.0.

STORM (Stanford)66

Best for: Structured knowledge curation, article-length synthesis, academic workflows

28K stars. 84.8% citation recall / 85.2% precision (peer-reviewed). Wikipedia-style article generation. Co-STORM collaborative mode (EMNLP 2024). MIT license.

Claude Research38

Best for: Long-form synthesis, custom research agent development

Multi-agent architecture (Opus 4 lead + Sonnet 4 subagents). 90.2% improvement over single-agent (methodology published). 1M context. Agent SDK for custom pipelines.

Grok DeepSearch / DeeperSearch35

Best for: Breaking news orientation, social sentiment research, trend analysis

Only deep research tool with native real-time X/Twitter integration. 4-agent parallel system. Journalists use for breaking story orientation.

Elicit40

Best for: Systematic reviews, meta-analyses, clinical research, data extraction

138M papers + 545K clinical trials. API (Mar 2026). Claude Opus 4.5 extraction. Reddit-endorsed as 'essential for meta-analyses.' Sentence-level citations. $10/mo.

Consensus38

Best for: Quick claim-level evidence checking — 'does the literature support X?'

200M+ papers (via Semantic Scholar). Unique consensus meter showing scientific agreement. 1,000+ papers/query with Deep Search. 126 HN pts.

Below the cut line

Tavily40

Best for: Building research agents — de facto standard search backend

Default search backend for GPT Researcher, CrewAI, LangChain. Acquired by Nebius (Feb 2026). /research endpoint GA. MCP server. Domain governance.

Search & News #5

Firecrawl MCP Server72

Best for: Web data extraction for research pipelines

~40K stars. Web scraping → clean markdown for LLMs. MCP server. Research agent mode. AGPL-3.0.

Product / Business Development #1Search & News #2

Head to head

Perplexity Deep ResearchvsOpenAI Deep Research

Perplexity wins on speed (15-30s vs 3-15 min), citation quality, SimpleQA accuracy (93.9%), and price ($20 vs $200). OpenAI wins on expert reasoning (HLE 26.6% vs 21.1%) and GAIA (72.57%). Use case dependent: speed vs depth.

Google NotebookLMvsPerplexity Deep Research

Google leads on source coverage (100+ vs 10-30), multimodal (video/audio/PDF), and HN buzz (907 vs 368 pts). Perplexity leads on speed and citation quality. NotebookLM is a research workbench; Perplexity is a fast answer engine.

GPT ResearchervsTongyi DeepResearch

GPT Researcher has proven CMU benchmark validation and real PyPI adoption (15.9K/wk). Tongyi has superior benchmark scores (HLE 32.9 vs N/A) and runs locally. GPT Researcher = established leader; Tongyi = disruptive newcomer.

STORM (Stanford)vsGPT Researcher

STORM produces better structured Wikipedia-style articles with peer-reviewed citation quality (85.2% precision). GPT Researcher is more practical for agentic research workflows with MCP integration. STORM = knowledge artifacts; GPT Researcher = research pipelines.

ElicitvsConsensus

Elicit wins for deep academic workflows (systematic reviews, data extraction, API). Consensus wins for quick claim-level evidence checking with unique consensus meter. Complementary rather than competitive.

Public signals

benchmark2025-05

CMU DeepResearchGym: GPT Researcher #1 on citation quality, report quality, info coverage

Independent academic benchmark. GPT Researcher beat Perplexity, OpenAI, OpenDeepSearch, HuggingFace.

benchmark2025-02

Helicone: OpenAI Deep Research 26.6% HLE, 72.57% GAIA — highest of any system

Independent comparison of deep research benchmarks. OpenAI leads on reasoning; Perplexity leads on speed.

benchmark2025-11

Tongyi DeepResearch: HLE 32.9 — exceeds OpenAI's 26.6 on same benchmark

Open-source model matching/exceeding closed-source leaders. 'DeepSeek moment for AI agents.'

adoption2026-03-20

GPT Researcher: 15,876 weekly PyPI downloads

Real adoption signal for open-source research agent.

community2024-09-30

NotebookLM: 907 HN pts — highest engagement of any research tool

Audio overview feature drove massive attention. 5 threads >200 pts.

community2026

Reddit r/GradSchool: Elicit 'essential for meta-analyses'

Organic academic endorsement for systematic review workflows.

funding2026-02

Tavily acquired by Nebius (Feb 2026) — validates research infrastructure layer

Acquisition validates Tavily as critical research infrastructure. De facto standard for agent search.

launch2026-02

OpenAI adds MCP support to Deep Research (Feb 2026)

First major platform to add MCP to deep research. Enterprise integration signal.

launch2026-03

Elicit API launched March 2026 — programmatic research workflows

Enables systematic reviews via API. Claude Opus 4.5 extraction.

What changes this

Independent benchmark for Claude Research — if it matches OpenAI on HLE/GAIA with third-party validation, moves to #2-3

Tongyi DeepResearch adoption data — if real PyPI/usage stats emerge, could leapfrog GPT Researcher to #4

OpenAI drops Deep Research price — if available at $20 unlimited, threatens Perplexity's #1

Perplexity ships MCP — would close the enterprise gap with OpenAI and solidify #1 across all use cases

NotebookLM ships API — would make Google a serious infrastructure play competing with Tavily

Fresh CMU benchmarks against Tongyi and updated OpenAI — would clarify open-source leader question

Consensus Deep Search gains traction — 11 HN pts is concerning; if adoption stays flat, risks becoming a footnote

Karpathy AutoResearch gets 'knowledge research' mode — would immediately enter main ranking, likely top 5 given 43K stars