Search & News

Web search, scraping, and deep research tools for AI agents. The category has split into three lanes: search APIs (Brave, Exa, Tavily), scrape/crawl tools (Firecrawl, Crawl4AI), and deep research APIs (Parallel, Perplexity Sonar). Most serious agent workflows need tools from the first two lanes. MCP support is table stakes — the real differentiators are benchmark quality, latency, index independence, and license. Deep research lane is still immature — WideSearch academic benchmark shows near 0% success on broad tasks.

Ranked

Signals

Current ranking

Brave Search API72

Best for: Default search API for AI agents — fastest, broadest MCP tooling, independent benchmark winner

#1 Agent Score (14.89) in AIMultiple 2026, confirmed by Data4AI independently. Fastest latency (669ms). Only independent commercial search API after Bing API shutdown (Aug 2025). Independent index (40B pages). 6-tool MCP server. SOC 2 Type II attested (Oct 2025). 35K+ API customers, 2,700+ paid. Free tier (2K queries/month).

Firecrawl MCP Server72

Best for: Web scraping, structured extraction, turning messy pages into LLM-ready content — the research/extraction workhorse

95,324 GitHub stars — highest traction in category. Agent Score 14.58 (#2, highest relevance 4.30). 1.23M combined weekly downloads. ScrapeOps 10/10 ('best tool you can get'). MCP server: 5,809 stars (highest in category). Search + scrape + autonomous /agent endpoint — only tool covering all three lanes. FIRE-1 agent, Spark model family, parallel agents. Multi-language SDKs (Python, JS, Go, Rust, Java). 95.3% success rate. Browser MCP: fastest extraction (7s, 83% success).

Product / Business Development #1Research #12

Exa MCP Server87

Best for: Semantic search, similarity search, market mapping — strongest where traditional keyword search fails

Agent Score 14.39 (#3), statistically tied with top tier. Best HN traction in category (412 pts). 940K weekly downloads (670K PyPI + 270K npm). SOC 2 Type II. Enterprise customers: Notion, Cursor, AWS, Databricks. Neural index: people (1B+ LinkedIn profiles), company, code verticals. Exa Deep launched March 4 2026 — enters deep research lane ($12-15/1K). HumAI: 94.9% SimpleQA (highest), 81% complex retrieval. $85M Series B at $700M, Nvidia-backed.

Product / Business Development #8

No preview

SearXNG88

Best for: Privacy-first, self-hosted meta-search — no API keys, no vendor lock-in, no cost

26,644 stars, active development (last commit 2026-03-15). Zero cost, zero API keys — aggregates 70+ search engines. Privacy guarantee: no query ever leaves your infrastructure. Rolling Docker releases. HN: 302 pts + 134 pts. AGPL-3.0.

Tavily40

Best for: LangChain-native workflows where Tavily is the path of least resistance — fastest response time and highest uptime

1.18M weekly downloads (#2, 1.03M PyPI + 155K npm). Default search tool in LangChain. Full platform now: search + extract + crawl + /research endpoint (GA). HumAI: fastest response (187ms), highest uptime (99.94%). Acquired by Nebius for $275-400M — strongest financial runway. Enterprise pricing $0.0002/query at volume.

Research #11

Jina Reader60

Best for: Simplest URL-to-markdown conversion (one-line API) with ReaderLM-v2 for local extraction

10,292 stars. ReaderLM-v2 (1.5B SLM, 512K context, 29 languages) presented at ICLR 2025. Hosted API remains active. The r.jina.ai URL-prefix pattern is the simplest possible interface for single-page reads.

Crawl4AI93

Best for: Free, open-source self-hosted crawling — Apache-2.0, no vendor dependency, full developer control

62,249 GitHub stars (#2 in category). 6,353 forks — nearly matches Firecrawl's 6,516 (heavy dev usage). Apache-2.0 license. Completely free. 384K weekly PyPI downloads. Actively maintained — v0.8.5 released 2026-03-18. ScrapeOps: 'Best open source' (7/10). v0.8.x: deep crawl crash recovery, prefetch mode (5-10x faster), adaptive intelligence.

Below the cut line

Parallel AI Search40

Best for: Deep research where quality matters more than speed

Agent Score 14.21 (#4) — top tier on quality. Self-reported BrowseComp: 48%/58% accuracy vs GPT-4 browsing 1%. $740M valuation (Kleiner Perkins, Index Ventures). Founded by Parag Agrawal (ex-Twitter CEO).

No preview

You.com Search API35

Best for: OpenAI-native search — the provider OpenAI already uses

OpenAI integrated You.com as core search provider — strongest distribution signal in category. Self-reported: 93% SimpleQA, #1 DeepSearchQA. MCP server launched. 1,000 free API queries/month.

No preview

Perplexity Sonar API40

Best for: Highest raw answer accuracy (87% in HumAI) with citation synthesis

87% accuracy (highest in HumAI). 94% citation quality. Sonar Deep Research for multi-step retrieval. Official MCP server. Citation tokens no longer billed (Feb 2026).

No preview

Linkup35

Best for: AI-native web search API with sub-second speed and strong angel backing

$10M seed (Feb 2026, Gradient). Angels: Olivier Pomel (Datadog CEO), Arthur Mensch (Mistral CEO). Customers include KPMG, Artisan. /fast endpoint for sub-second search. MCP integrated with Claude Desktop.

No preview

Bright Data MCP65

Best for: Enterprise scraping behind aggressive anti-bot defenses — perfect accuracy where others fail

#1 Browser MCP Benchmark (AIMultiple 2026): 100% extraction success, 90% automation, 77% scalability. 2,214 stars. 60+ MCP tools — broadest MCP tooling. Industrial-grade anti-bot (CAPTCHA solving, proxy rotation, geo-unblocking). Free MCP tier. The only option for sites that actively block bots.

No preview

Hyperbrowser MCP50

Best for: AI-native browser automation with stealth capabilities — Claude Computer Use / OpenAI Computer Use

90% browser automation (tied #1 with Bright Data, AIMultiple 2026). Stealth-first: CAPTCHA solving, IP rotation, fingerprint management. Supports Claude + OpenAI Computer Use agents. 63 HN pts on launch. 10,000 concurrent browsers.

No preview

ScrapeGraphAI82

Best for: LLM-graph-based extraction — describe what you want, AI builds the extraction pipeline

23,033 GitHub stars. 194 HN pts (strongest in category). Active development v1.74.0 (Mar 15, 2026). arXiv paper Feb 2026. Open-source + hosted API dual model. 'You Only Scrape Once' graph reuse. Apache-2.0 OSS.

No preview

Valyu DeepSearch35

Best for: High-stakes knowledge work (finance, economics, medical) — if claims hold

Claims 94% SimpleQA and 79% FreshQA (vs Google 39%). 50+ proprietary data sources (SEC, clinical trials). a16z backed. LangChain integration. DeepSearch v2.0 with tool calling.

No preview

Serper35

Best for: Cheapest Google SERP access ($0.30/1K queries)

3-10x cheaper than alternatives. LangChain integration. 2,500 free searches on signup.

No preview

Spider Cloud58

Best for: High-volume scraping performance (Rust-based)

Claims 100K pages/sec, 7x Firecrawl throughput. MIT license. 2,332 stars. Rust-based zero-copy parsing. Cost advantage at scale (~$48/100K pages vs Firecrawl ~$240).

No preview

Google Grounding with Search35

Best for: Gemini-native workflows only

Native to Gemini API. 5,000 free prompts/month. Gemini Deep Research preview: 59.2% BrowseComp, 46.4% HLE (highest reported).

No preview

Head to head

Brave SearchvsExa

Brave wins on speed (669ms vs ~1,200ms), benchmark score (14.89 vs 14.39), MCP breadth (6 tools vs 1), and free tier (2K queries/mo). Exa wins on semantic depth (neural embeddings, people/company/code verticals), weekly downloads (940K vs REST-only), HN traction (412 vs 95 pts), and now deep research (Exa Deep, March 2026). Use Brave as the default; switch to Exa when you need meaning-based search, vertical lookups, or deep research.

Brave SearchvsTavily

AIMultiple benchmark: ~1pt gap (14.89 vs 13.67) described as 'meaningful, not random' — confirmed independently by Data4AI. Brave: independent index (only one after Bing shutdown), SOC 2. Tavily: full platform (search + extract + /research GA), enterprise pricing ($0.0002/query at volume), 1.18M downloads. Meta Llama Stack now defaults to Brave over Tavily. Brave is objectively stronger on search quality; Tavily evolving into a platform play.

FirecrawlvsCrawl4AI

Firecrawl wins on features (search+scrape+agent), enterprise compliance, multi-language SDKs, benchmark score (14.58), success rate (95.3% vs 89.7%), and ScrapeOps rating (10/10 vs 7/10). Crawl4AI wins on license (Apache-2.0 vs AGPL), cost (free at any scale), fork ratio (6,353 vs 6,516 — nearly equal dev usage), and local LLM support. Crawl4AI actively maintained — v0.8.5 released March 18.

ExavsTavily

Exa wins on quality (14.39 vs 13.67 Agent Score, 81% vs 71% complex retrieval, 96% vs 85% citations), neural index, and now deep research (Exa Deep). Tavily wins on distribution (1.18M weekly downloads, LangChain default), response time (187ms), and full platform breadth (search+extract+research GA). Exa is the better tool; Tavily is the more convenient one. Both have strategic risk — Tavily's Nebius acquisition, Exa's Deep claims unverified.

FirecrawlvsJina Reader

Firecrawl does everything Jina Reader does, plus search, structured extraction, batch processing, and agent endpoint. Jina Reader's OSS repo is stale (no commits for 10+ months). Firecrawl is the superset choice for new projects. Firecrawl 4-5x cheaper at volume.

SearXNGvsBrave Search

SearXNG: free, self-hosted, private, 70+ aggregated engines. Brave: higher quality (14.89 benchmark), faster (669ms), managed, SOC 2. Privacy vs quality tradeoff. SearXNG is the only option for teams that can't send queries to third-party APIs.

Parallel AIvsPerplexity Sonar

Both serve deep research. Parallel: 48% BrowseComp vs Perplexity 8%. Both slow (13,600ms vs 11,000ms+). Parallel delivers on depth; Perplexity has higher raw accuracy (87% HumAI) but truncation issues. Parallel wins on deep research quality if you can tolerate latency and cost.

Public signals

benchmark2026-02

AIMultiple 2026 Agentic Search Benchmark — Brave #1, Firecrawl #2, Exa #3

100 queries, GPT-5.2 judge, bootstrap resampling with 10K resamples for 95% CI. Top 4 (Brave, Firecrawl, Exa, Parallel Pro) statistically indistinguishable. Brave-Tavily gap (~1pt) is 'meaningful, not random.'

acquisition2026-02-10

Nebius acquires Tavily for $275M (up to $400M with milestones)

Bloomberg-confirmed. API unchanged post-acquisition. No pricing changes yet — watch Q2 for Nebius integration shifts.

traction2026-03-17

Brave MCP Server — 799 stars, SOC 2 Type II, only independent search API after Bing shutdown

799 stars. 6 tools, SOC 2 Type II. 35K+ free customers, 2,700+ paid. Microsoft shut down Bing API (Aug 2025) — Brave is now the only independent commercial search API at scale. Snowflake + AWS integrations.

traction2026-03-19

Firecrawl: 95,324 stars, 1.23M downloads, ScrapeOps 10/10 — dominant scrape/crawl tool

95,324 main repo stars, 5,809 MCP stars (highest in category). 1.23M combined weekly downloads (764K PyPI + 470K npm). ScrapeOps: 'best tool you can get' (10/10). 143 contributors. AGPL-3.0 license is main blocker.

benchmark2026-02

AIMultiple 2026 Browser MCP Benchmark — Bright Data #1 (100%), Firecrawl #3 (7s fastest)

8 MCP servers, 4 tasks × 5 runs, LangGraph + Claude 3.5 Sonnet. Bright Data: 100% extraction, 90% automation, 77% scalability. Firecrawl: 83% extraction, fastest speed (7s). Hyperbrowser: 90% automation. Exa: 23% extraction (best for search, not scrape).

traction2026-03-04

Exa: 940K weekly downloads, Exa Deep launch (March 4 2026) enters deep research lane

940K weekly downloads (670K PyPI + 270K npm). Exa Deep: two tiers — deep ($12/1K, 4-12s) and deep-reasoning ($15/1K, 12-50s). HumAI: 94.9% SimpleQA (highest). MCP server: 4,031 stars but only 23% web search success — use SDK directly.

funding2025-09-03

Exa raises $85M Series B at $700M valuation, Nvidia NVentures backing

Infrastructure-grade credibility. Exa serves web search to thousands of companies including Cursor. Feb 2026: Exa Fast (sub-500ms) and Exa Deep (agentic) launched.

comparison2026

HumAI: Perplexity vs Tavily vs Exa vs You.com complete comparison 2026

Independent comparison. Perplexity highest accuracy (87%), Tavily fastest (187ms, 99.94% uptime), Exa best citations (96%). Exa vs Tavily: 81% vs 71% on complex retrieval.

comparison2026

Data4AI independently confirms Brave > Tavily gap is statistically significant

Second independent source confirming AIMultiple findings. Brave 14.89 vs Tavily 13.67 gap 'held up across repeated statistical tests.' Strengthens Brave #1 position.

benchmark2026

WideSearch academic benchmark: deep research systems score near 0% on broad tasks

arXiv preprint from academic researchers. Over 10 agentic search systems evaluated, best performer only 5%. Tempers ALL 'deep research' claims. The entire deep research lane is immature.

comparison2026

ScrapeOps: Firecrawl 10/10, ScrapeGraphAI 10/10, Crawl4AI 7/10 ('best OSS')

Independent hands-on testing. Firecrawl: 'best tool you can get.' Crawl4AI: 'best open source.' ScrapeGraphAI tied with Firecrawl. 'Almost all open source tools are a complete waste of time' — only Crawl4AI viable.

comparison2026-01

Three independent Crawl4AI vs Firecrawl comparisons — Firecrawl wins quality, Crawl4AI wins license/cost

Capsolver, Bright Data, Apify all agree: Firecrawl 95.3% success vs Crawl4AI 89.7%. Crawl4AI wins on license/cost. Crawl4AI actively maintained (v0.8.5, Mar 18).

funding2026-02

Linkup raises $10M seed — AI-native search API (Gradient, Mistral/Datadog founders)

New contender. Angels include Datadog CEO and Mistral CEO. Customers include KPMG. Sub-second /fast endpoint. Below cut line — too early.

traction2026-03

Bright Data MCP: 2,213 stars, 60+ MCP tools, free tier — NEW contender

Industrial-grade anti-bot (CAPTCHA, proxies, geo-unblocking). Not a search API — web access infrastructure. Only option for sites with aggressive bot defenses.

watch2026-03

Gemini Deep Research API in preview — 59.2% BrowseComp, highest HLE (46.4%)

Best BrowseComp and HLE scores reported, but preview status, locked to Gemini API, no MCP server, no independent verification. If exits preview with MCP support, deep research lane shifts.

What changes this

If Brave ships a deep research endpoint — would make it the clear #1 across all lanes.

If Exa Deep gets independent benchmark validation — could leapfrog to #2 overall. Self-reported claims (March 2026) need verification.

If Tavily closes the AIMultiple gap — the Nebius resources make this plausible; next benchmark update will be telling. If pricing changes post-acquisition, it drops.

If Crawl4AI ships a search API or MCP server — would threaten Firecrawl's all-in-one position with a permissive license.

If WideSearch-style broad research tasks become solvable (>20% success) — reshuffles the deep research lane entirely.

Gemini Deep Research API is now in preview (59.2% BrowseComp, 46.4% HLE). If it exits preview with MCP support, deep research lane shifts dramatically.

If You.com's benchmark claims (93% SimpleQA, #1 DeepSearchQA) are independently verified, it enters ranked list on OpenAI distribution moat alone.

If Linkup gets independent benchmark validation — $10M seed and claimed parity; if confirmed, enters top-5.

Jina Reader — on track for delisting by mid-2026 if no repo activity. Effectively dead (10+ months no commits).

A second independent benchmark beyond AIMultiple would add confidence or reveal blind spots in current rankings.