skillpack.co
All problems

Search & News

Web search, scraping, and deep research tools for AI agents. The category has split into three lanes: search APIs (Brave, Exa, Tavily), scrape/crawl tools (Firecrawl, Crawl4AI), and deep research APIs (Parallel, Perplexity Sonar). Most serious agent workflows need tools from the first two lanes. MCP support is table stakes — the real differentiators are benchmark quality, latency, index independence, and license. Deep research lane is still immature — WideSearch academic benchmark shows near 0% success on broad tasks.

18

Ranked

15

Signals

Current ranking

1
Brave Search API72

Best for: Default search API for AI agents — fastest, broadest MCP tooling, independent benchmark winner

#1 Agent Score (14.89) in AIMultiple 2026, confirmed by Data4AI independently. Fastest latency (669ms). Only independent commercial search API after Bing API shutdown (Aug 2025). Independent index (40B pages). 6-tool MCP server. SOC 2 Type II attested (Oct 2025). 35K+ API customers, 2,700+ paid. Free tier (2K queries/month).

2
Firecrawl MCP Server72

Best for: Web scraping, structured extraction, turning messy pages into LLM-ready content — the research/extraction workhorse

95,324 GitHub stars — highest traction in category. Agent Score 14.58 (#2, highest relevance 4.30). 1.23M combined weekly downloads. ScrapeOps 10/10 ('best tool you can get'). MCP server: 5,809 stars (highest in category). Search + scrape + autonomous /agent endpoint — only tool covering all three lanes. FIRE-1 agent, Spark model family, parallel agents. Multi-language SDKs (Python, JS, Go, Rust, Java). 95.3% success rate. Browser MCP: fastest extraction (7s, 83% success).

Product / Business Development #1Research #12
3
Exa MCP Server87

Best for: Semantic search, similarity search, market mapping — strongest where traditional keyword search fails

Agent Score 14.39 (#3), statistically tied with top tier. Best HN traction in category (412 pts). 940K weekly downloads (670K PyPI + 270K npm). SOC 2 Type II. Enterprise customers: Notion, Cursor, AWS, Databricks. Neural index: people (1B+ LinkedIn profiles), company, code verticals. Exa Deep launched March 4 2026 — enters deep research lane ($12-15/1K). HumAI: 94.9% SimpleQA (highest), 81% complex retrieval. $85M Series B at $700M, Nvidia-backed.

Product / Business Development #8
4
SearXNG88

Best for: Privacy-first, self-hosted meta-search — no API keys, no vendor lock-in, no cost

26,644 stars, active development (last commit 2026-03-15). Zero cost, zero API keys — aggregates 70+ search engines. Privacy guarantee: no query ever leaves your infrastructure. Rolling Docker releases. HN: 302 pts + 134 pts. AGPL-3.0.

5
Tavily40

Best for: LangChain-native workflows where Tavily is the path of least resistance — fastest response time and highest uptime

1.18M weekly downloads (#2, 1.03M PyPI + 155K npm). Default search tool in LangChain. Full platform now: search + extract + crawl + /research endpoint (GA). HumAI: fastest response (187ms), highest uptime (99.94%). Acquired by Nebius for $275-400M — strongest financial runway. Enterprise pricing $0.0002/query at volume.

Research #11
6
Jina Reader60

Best for: Simplest URL-to-markdown conversion (one-line API) with ReaderLM-v2 for local extraction

10,292 stars. ReaderLM-v2 (1.5B SLM, 512K context, 29 languages) presented at ICLR 2025. Hosted API remains active. The r.jina.ai URL-prefix pattern is the simplest possible interface for single-page reads.

7
Crawl4AI93

Best for: Free, open-source self-hosted crawling — Apache-2.0, no vendor dependency, full developer control

62,249 GitHub stars (#2 in category). 6,353 forks — nearly matches Firecrawl's 6,516 (heavy dev usage). Apache-2.0 license. Completely free. 384K weekly PyPI downloads. Actively maintained — v0.8.5 released 2026-03-18. ScrapeOps: 'Best open source' (7/10). v0.8.x: deep crawl crash recovery, prefetch mode (5-10x faster), adaptive intelligence.

Below the cut line
8
Parallel AI Search40

Best for: Deep research where quality matters more than speed

Agent Score 14.21 (#4) — top tier on quality. Self-reported BrowseComp: 48%/58% accuracy vs GPT-4 browsing 1%. $740M valuation (Kleiner Perkins, Index Ventures). Founded by Parag Agrawal (ex-Twitter CEO).

9
You.com Search API35

Best for: OpenAI-native search — the provider OpenAI already uses

OpenAI integrated You.com as core search provider — strongest distribution signal in category. Self-reported: 93% SimpleQA, #1 DeepSearchQA. MCP server launched. 1,000 free API queries/month.

10
Perplexity Sonar API40

Best for: Highest raw answer accuracy (87% in HumAI) with citation synthesis

87% accuracy (highest in HumAI). 94% citation quality. Sonar Deep Research for multi-step retrieval. Official MCP server. Citation tokens no longer billed (Feb 2026).

11
Linkup35

Best for: AI-native web search API with sub-second speed and strong angel backing

$10M seed (Feb 2026, Gradient). Angels: Olivier Pomel (Datadog CEO), Arthur Mensch (Mistral CEO). Customers include KPMG, Artisan. /fast endpoint for sub-second search. MCP integrated with Claude Desktop.

12
Bright Data MCP65

Best for: Enterprise scraping behind aggressive anti-bot defenses — perfect accuracy where others fail

#1 Browser MCP Benchmark (AIMultiple 2026): 100% extraction success, 90% automation, 77% scalability. 2,214 stars. 60+ MCP tools — broadest MCP tooling. Industrial-grade anti-bot (CAPTCHA solving, proxy rotation, geo-unblocking). Free MCP tier. The only option for sites that actively block bots.

13
Hyperbrowser MCP50

Best for: AI-native browser automation with stealth capabilities — Claude Computer Use / OpenAI Computer Use

90% browser automation (tied #1 with Bright Data, AIMultiple 2026). Stealth-first: CAPTCHA solving, IP rotation, fingerprint management. Supports Claude + OpenAI Computer Use agents. 63 HN pts on launch. 10,000 concurrent browsers.

14
ScrapeGraphAI82

Best for: LLM-graph-based extraction — describe what you want, AI builds the extraction pipeline

23,033 GitHub stars. 194 HN pts (strongest in category). Active development v1.74.0 (Mar 15, 2026). arXiv paper Feb 2026. Open-source + hosted API dual model. 'You Only Scrape Once' graph reuse. Apache-2.0 OSS.

15
Valyu DeepSearch35

Best for: High-stakes knowledge work (finance, economics, medical) — if claims hold

Claims 94% SimpleQA and 79% FreshQA (vs Google 39%). 50+ proprietary data sources (SEC, clinical trials). a16z backed. LangChain integration. DeepSearch v2.0 with tool calling.

16
Serper35

Best for: Cheapest Google SERP access ($0.30/1K queries)

3-10x cheaper than alternatives. LangChain integration. 2,500 free searches on signup.

17
Spider Cloud58

Best for: High-volume scraping performance (Rust-based)

Claims 100K pages/sec, 7x Firecrawl throughput. MIT license. 2,332 stars. Rust-based zero-copy parsing. Cost advantage at scale (~$48/100K pages vs Firecrawl ~$240).

18
Google Grounding with Search35

Best for: Gemini-native workflows only

Native to Gemini API. 5,000 free prompts/month. Gemini Deep Research preview: 59.2% BrowseComp, 46.4% HLE (highest reported).

Head to head

Brave SearchvsExa

Brave wins on speed (669ms vs ~1,200ms), benchmark score (14.89 vs 14.39), MCP breadth (6 tools vs 1), and free tier (2K queries/mo). Exa wins on semantic depth (neural embeddings, people/company/code verticals), weekly downloads (940K vs REST-only), HN traction (412 vs 95 pts), and now deep research (Exa Deep, March 2026). Use Brave as the default; switch to Exa when you need meaning-based search, vertical lookups, or deep research.

Brave SearchvsTavily

AIMultiple benchmark: ~1pt gap (14.89 vs 13.67) described as 'meaningful, not random' — confirmed independently by Data4AI. Brave: independent index (only one after Bing shutdown), SOC 2. Tavily: full platform (search + extract + /research GA), enterprise pricing ($0.0002/query at volume), 1.18M downloads. Meta Llama Stack now defaults to Brave over Tavily. Brave is objectively stronger on search quality; Tavily evolving into a platform play.

FirecrawlvsCrawl4AI

Firecrawl wins on features (search+scrape+agent), enterprise compliance, multi-language SDKs, benchmark score (14.58), success rate (95.3% vs 89.7%), and ScrapeOps rating (10/10 vs 7/10). Crawl4AI wins on license (Apache-2.0 vs AGPL), cost (free at any scale), fork ratio (6,353 vs 6,516 — nearly equal dev usage), and local LLM support. Crawl4AI actively maintained — v0.8.5 released March 18.

ExavsTavily

Exa wins on quality (14.39 vs 13.67 Agent Score, 81% vs 71% complex retrieval, 96% vs 85% citations), neural index, and now deep research (Exa Deep). Tavily wins on distribution (1.18M weekly downloads, LangChain default), response time (187ms), and full platform breadth (search+extract+research GA). Exa is the better tool; Tavily is the more convenient one. Both have strategic risk — Tavily's Nebius acquisition, Exa's Deep claims unverified.

FirecrawlvsJina Reader

Firecrawl does everything Jina Reader does, plus search, structured extraction, batch processing, and agent endpoint. Jina Reader's OSS repo is stale (no commits for 10+ months). Firecrawl is the superset choice for new projects. Firecrawl 4-5x cheaper at volume.

SearXNGvsBrave Search

SearXNG: free, self-hosted, private, 70+ aggregated engines. Brave: higher quality (14.89 benchmark), faster (669ms), managed, SOC 2. Privacy vs quality tradeoff. SearXNG is the only option for teams that can't send queries to third-party APIs.

Parallel AIvsPerplexity Sonar

Both serve deep research. Parallel: 48% BrowseComp vs Perplexity 8%. Both slow (13,600ms vs 11,000ms+). Parallel delivers on depth; Perplexity has higher raw accuracy (87% HumAI) but truncation issues. Parallel wins on deep research quality if you can tolerate latency and cost.

Public signals

What changes this

If Brave ships a deep research endpoint — would make it the clear #1 across all lanes.

If Exa Deep gets independent benchmark validation — could leapfrog to #2 overall. Self-reported claims (March 2026) need verification.

If Tavily closes the AIMultiple gap — the Nebius resources make this plausible; next benchmark update will be telling. If pricing changes post-acquisition, it drops.

If Crawl4AI ships a search API or MCP server — would threaten Firecrawl's all-in-one position with a permissive license.

If WideSearch-style broad research tasks become solvable (>20% success) — reshuffles the deep research lane entirely.

Gemini Deep Research API is now in preview (59.2% BrowseComp, 46.4% HLE). If it exits preview with MCP support, deep research lane shifts dramatically.

If You.com's benchmark claims (93% SimpleQA, #1 DeepSearchQA) are independently verified, it enters ranked list on OpenAI distribution moat alone.

If Linkup gets independent benchmark validation — $10M seed and claimed parity; if confirmed, enters top-5.

Jina Reader — on track for delisting by mid-2026 if no repo activity. Effectively dead (10+ months no commits).

A second independent benchmark beyond AIMultiple would add confidence or reveal blind spots in current rankings.