OpenAI Deep Research

active

Agentic research mode powered by o3/o4-mini. 26.6% HLE (highest of any system), 72.57% GAIA, MCP support (Feb 2026). Slower (3-15 min) but deepest reasoning.

Score 42

Where it wins

26.6% on Humanity's Last Exam — highest of any system (Helicone)

72.57% GAIA benchmark — best reported result

MCP support added Feb 2026 — first major platform to do so

API available via /responses endpoint with deep research models

Domain-restricted searches, real-time progress tracking

Where to be skeptical

3-15 min per query — 10-60x slower than Perplexity

$200/mo for unlimited (Pro); Plus ($20/mo) limited to 10 queries/mo

Closed source, no self-hosting option

DeepResearch Bench: o3 standalone outperformed Deep Research mode in some evals (FutureSearch)

Editorial verdict

#2 in research. Reasoning king — 26.6% HLE and 72.57% GAIA are best reported results from any system. MCP support (Feb 2026) enables enterprise toolchain integration. Slower and pricier than Perplexity but unmatched for PhD-level questions.

Source

Docs: openai.com

Found via SkillPack? ★ Star us on GitHub

Research

#02of 12

Expert-level reasoning, complex multi-step research, enterprise MCP workflows

GPT Researcher

Open-source autonomous deep research agent. CMU DeepResearchGym #1 on citation quality, report quality, info coverage. 25.8K stars, 15.9K weekly PyPI downloads. Apache 2.0.

Tongyi DeepResearch

First fully open-source deep research agent matching closed-source leaders on benchmarks. HLE 32.9 (exceeds OpenAI's 26.6), 30.5B params / 3.3B active (MoE), runs locally. 18.5K stars. Apache 2.0.

STORM (Stanford)

Stanford's LLM-powered knowledge curation system. Generates Wikipedia-style articles with citations in ~3 min. 28K stars, 84.8% citation recall / 85.2% precision (peer-reviewed). MIT license.