ScrapeGraphAI

active

LLM-graph-based web scraper — describe what you want, AI builds the extraction graph. 23K stars, 194 HN pts, active development (v1.74.0, Mar 2026). Open-source + hosted API.

Connector

Composite

Complexity

searchresearch

68/100

Trust

23K+

Stars

Evidence

Repo health

68/100

21h ago

Last push

Open issues

2,019

Forks

117

Contributors

Editorial verdict

Below cut line in search-news. 23K stars, 194 HN pts (strongest HN score in category), unique LLM graph pipeline approach. But only 14.6K weekly PyPI downloads vs Firecrawl's 752K — star count likely inflated by viral novelty. Stars-to-downloads ratio 1,580:1 vs Firecrawl's 81:1.

Source

GitHub: ScrapeGraphAI/Scrapegraph-ai

Docs: docs.scrapegraphai.com

Public evidence

strong2024-05

HN: 194 pts 'ScrapeGraphAI: Web scraping using LLM and direct graph logic'

Strongest HN launch score in the search/scrape category. Strong developer interest in the LLM-graph extraction concept.

194 points, 63 comments on HNHN community

strong2026-03-15

v1.74.0 released Mar 15, 2026 — actively maintained

Active development with frequent versioned releases. 30+ contributors.

Regular releases, 30+ contributorsScrapeGraphAI maintainers

moderate2026-02

arXiv paper: ScrapeGraphAI-100K dataset (Feb 2026)

100K dataset for LLM-based web information extraction — academic validation of the approach.

arXiv preprintScrapeGraphAI research team

moderate2026-03

Star inflation flag: 1,580:1 stars-to-downloads ratio vs Firecrawl 81:1

23K stars vs 14.6K weekly downloads is anomalously high ratio. Firecrawl (94K stars, 752K/week) and Crawl4AI (62K stars, 372K/week) are both ~100:1. ScrapeGraphAI likely benefited from viral novelty rather than real production adoption.

23,033 stars, 14,611 weekly PyPI downloadsGitHub / PyPI metrics

How does this compare?

See side-by-side metrics against other skills in the same category.

COMPARE SKILLS →

Where it wins

Unique LLM graph pipeline — describe extraction goal, AI builds the graph

23,033 GitHub stars

194 HN pts at launch — strongest HN engagement in this category

Active development: v1.74.0 released Mar 15, 2026

arXiv paper Feb 2026 — academic credibility

Apache-2.0 (OSS) + hosted API dual model

Where to be skeptical

14,611 weekly PyPI downloads — star inflation (1,580:1 stars-to-downloads vs Firecrawl 81:1)

~$85/mo Growth plan for 10K structured extractions — expensive vs free Crawl4AI

No benchmark placement in AIMultiple 2026

Lower real adoption than star count suggests

Ranking in categories

Search & News

#14of 18

LLM-graph-based extraction — describe what you want, AI builds the extraction pipeline

Know a better alternative?

Submit evidence and we'll run the full pipeline.

SUBMIT →

Similar skills

Crawl4AI

Free, open-source web scraping (Apache-2.0). 62K stars, actively maintained (v0.8.5, 2026-03-18), 372K weekly PyPI downloads. Best open-source alternative to Firecrawl.

SearXNG

Privacy-first, self-hosted meta-search engine aggregating 70+ upstream engines. Zero cost, zero API keys, full data sovereignty.

Firecrawl MCP Server

Official Firecrawl MCP for scraping, extraction, and deep research workflows. 50K+ npm weekly downloads, backed by $14.5M Series A.

Brave Search API

Independent web search API with own 40B-page index. #1 on AIMultiple 2026 Agentic Search Benchmark. 6-tool MCP server, SOC 2 Type II, 669ms latency.

Raw GitHub source

GitHub README peek

Constrained peek so you can sanity-check the source material without leaving the site.

🚀 Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)? Check out our enhanced version at ScrapeGraphAI.com! 🚀

🕷️ ScrapeGraphAI: You Only Scrape Once

API Banner

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).

Just say which information you want to extract and the library will do it for you!

🚀 Integrations

ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python or Node.js, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..

You can find more informations at the following link

Integrations:

API: Documentation
SDKs: Python, Node
LLM Frameworks: Langchain, Llama Index, Crew.ai, Agno, CamelAI
Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, Dify, Toolhouse

🚀 Quick install

The reference page for Scrapegraph-ai is available on the official page of PyPI: pypi.

pip install scrapegraphai

# IMPORTANT (for fetching websites content)
playwright install

Note: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

💻 Usage

There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).

The most common one is the SmartScraperGraph, which extracts information from a single page given a user prompt and a source URL.

from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "model_tokens": 8192,
        "format": "json",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
    source="https://scrapegraphai.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()

import json
print(json.dumps(result, indent=4))

[!NOTE] For OpenAI and other models you just need to change the llm config!
graph_config = {
   "llm": {
       "api_key": "YOUR_OPENAI_API_KEY",
       "model": "openai/gpt-4o-mini",
   },
   "verbose": True,
   "headless": False,
}

The output will be a dictionary like the following:

{
    "description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
    "founders": [
        {
            "name": "",
            "role": "Founder & Technical Lead",
            "linkedin": "https://www.linkedin.com/in/perinim/"
        },
        {
            "name": "Marco Vinciguerra",
            "role": "Founder & Software Engineer",
            "linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
        },
        {
            "name": "Lorenzo Padoan",
            "role": "Founder & Product Engineer",
            "linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
        }
    ],
    "social_media_links": {
        "linkedin": "https://www.linkedin.com/company/101881123",
        "twitter": "https://x.com/scrapegraphai",
        "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
    }
}

There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.

Pipeline Name	Description
SmartScraperGraph	Single-page scraper that only needs a user prompt and an input source.
SearchGraph	Multi-page scraper that extracts information from the top n search results of a search engine.
SpeechGraph	Single-page scraper that extracts information from a website and generates an audio file.

View on GitHub →