Strongest HN launch score in the search/scrape category. Strong developer interest in the LLM-graph extraction concept.
ScrapeGraphAI
activeLLM-graph-based web scraper — describe what you want, AI builds the extraction graph. 23K stars, 194 HN pts, active development (v1.74.0, Mar 2026). Open-source + hosted API.
68/100
Trust
23K+
Stars
4
Evidence
Repo health
21h ago
Last push
1
Open issues
2,019
Forks
117
Contributors
Editorial verdict
Below cut line in search-news. 23K stars, 194 HN pts (strongest HN score in category), unique LLM graph pipeline approach. But only 14.6K weekly PyPI downloads vs Firecrawl's 752K — star count likely inflated by viral novelty. Stars-to-downloads ratio 1,580:1 vs Firecrawl's 81:1.
Source
GitHub: ScrapeGraphAI/Scrapegraph-ai
Docs: docs.scrapegraphai.com
Public evidence
Active development with frequent versioned releases. 30+ contributors.
100K dataset for LLM-based web information extraction — academic validation of the approach.
23K stars vs 14.6K weekly downloads is anomalously high ratio. Firecrawl (94K stars, 752K/week) and Crawl4AI (62K stars, 372K/week) are both ~100:1. ScrapeGraphAI likely benefited from viral novelty rather than real production adoption.
How does this compare?
See side-by-side metrics against other skills in the same category.
Where it wins
Unique LLM graph pipeline — describe extraction goal, AI builds the graph
23,033 GitHub stars
194 HN pts at launch — strongest HN engagement in this category
Active development: v1.74.0 released Mar 15, 2026
arXiv paper Feb 2026 — academic credibility
Apache-2.0 (OSS) + hosted API dual model
Where to be skeptical
14,611 weekly PyPI downloads — star inflation (1,580:1 stars-to-downloads vs Firecrawl 81:1)
~$85/mo Growth plan for 10K structured extractions — expensive vs free Crawl4AI
No benchmark placement in AIMultiple 2026
Lower real adoption than star count suggests
Ranking in categories
Know a better alternative?
Submit evidence and we'll run the full pipeline.
Similar skills
Crawl4AI
83Free, open-source web scraping (Apache-2.0). 62K stars, actively maintained (v0.8.5, 2026-03-18), 372K weekly PyPI downloads. Best open-source alternative to Firecrawl.
SearXNG
78Privacy-first, self-hosted meta-search engine aggregating 70+ upstream engines. Zero cost, zero API keys, full data sovereignty.
Firecrawl MCP Server
70Official Firecrawl MCP for scraping, extraction, and deep research workflows. 50K+ npm weekly downloads, backed by $14.5M Series A.
Brave Search API
64Independent web search API with own 40B-page index. #1 on AIMultiple 2026 Agentic Search Benchmark. 6-tool MCP server, SOC 2 Type II, 669ms latency.
Raw GitHub source
GitHub README peek
Constrained peek so you can sanity-check the source material without leaving the site.
🚀 Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)? Check out our enhanced version at ScrapeGraphAI.com! 🚀
🕷️ ScrapeGraphAI: You Only Scrape Once
English | 中文 | 日本語 | 한국어 | Русский | Türkçe | Deutsch | Español | français | Português

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
Just say which information you want to extract and the library will do it for you!
<p align="center"> <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;"> </p>🚀 Integrations
ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python or Node.js, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..
You can find more informations at the following link
Integrations:
- API: Documentation
- SDKs: Python, Node
- LLM Frameworks: Langchain, Llama Index, Crew.ai, Agno, CamelAI
- Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, Dify, Toolhouse
🚀 Quick install
The reference page for Scrapegraph-ai is available on the official page of PyPI: pypi.
pip install scrapegraphai
# IMPORTANT (for fetching websites content)
playwright install
Note: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
💻 Usage
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).
The most common one is the SmartScraperGraph, which extracts information from a single page given a user prompt and a source URL.
from scrapegraphai.graphs import SmartScraperGraph
# Define the configuration for the scraping pipeline
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"model_tokens": 8192,
"format": "json",
},
"verbose": True,
"headless": False,
}
# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
source="https://scrapegraphai.com/",
config=graph_config
)
# Run the pipeline
result = smart_scraper_graph.run()
import json
print(json.dumps(result, indent=4))
[!NOTE] For OpenAI and other models you just need to change the llm config!
graph_config = { "llm": { "api_key": "YOUR_OPENAI_API_KEY", "model": "openai/gpt-4o-mini", }, "verbose": True, "headless": False, }
The output will be a dictionary like the following:
{
"description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
"founders": [
{
"name": "",
"role": "Founder & Technical Lead",
"linkedin": "https://www.linkedin.com/in/perinim/"
},
{
"name": "Marco Vinciguerra",
"role": "Founder & Software Engineer",
"linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
},
{
"name": "Lorenzo Padoan",
"role": "Founder & Product Engineer",
"linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
}
],
"social_media_links": {
"linkedin": "https://www.linkedin.com/company/101881123",
"twitter": "https://x.com/scrapegraphai",
"github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
}
}
There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.
| Pipeline Name | Description |
|---|---|
| SmartScraperGraph | Single-page scraper that only needs a user prompt and an input source. |
| SearchGraph | Multi-page scraper that extracts information from the top n search results of a search engine. |
| SpeechGraph | Single-page scraper that extracts information from a website and generates an audio file. |