ScrapeGraphAI

active

LLM-graph-based web scraper — describe what you want, AI builds the extraction graph. 23K stars, 194 HN pts, active development (v1.74.0, Mar 2026). Open-source + hosted API.

Score 82

Where it wins

Unique LLM graph pipeline — describe extraction goal, AI builds the graph

23,033 GitHub stars

194 HN pts at launch — strongest HN engagement in this category

Active development: v1.75.0 released Mar 18, 2026 (2 releases in 3 days)

arXiv paper Feb 2026 — academic credibility

Apache-2.0 (OSS) + hosted API dual model

Where to be skeptical

14,611 weekly PyPI downloads — star inflation (1,580:1 stars-to-downloads vs Firecrawl 81:1)

~$85/mo Growth plan for 10K structured extractions — expensive vs free Crawl4AI

No benchmark placement in AIMultiple 2026

Lower real adoption than star count suggests

Editorial verdict

Below cut line in search-news. 23K stars, 194 HN pts (strongest HN score in category), unique LLM graph pipeline approach. But only 14.6K weekly PyPI downloads vs Firecrawl's 752K — star count likely inflated by viral novelty. Stars-to-downloads ratio 1,580:1 vs Firecrawl's 81:1.

Source

GitHub: ScrapeGraphAI/Scrapegraph-ai

Docs: docs.scrapegraphai.com

Found via SkillPack? ★ Star us on GitHub

Videos

Reviews, tutorials, and comparisons from the community.

Game-Changing AI Web Scraping: ScrapeGraphAI Tutorial & Founder Secrets

Made By Agents·2025-03-15

Search & News

#14of 18

LLM-graph-based extraction — describe what you want, AI builds the extraction pipeline

Crawl4AI

Free, open-source web scraping (Apache-2.0). 62K stars, 6,353 forks (nearly matches Firecrawl), actively maintained (v0.8.5, 2026-03-18), 384K weekly PyPI downloads. Best open-source alternative to Firecrawl.

SearXNG

Privacy-first, self-hosted meta-search engine aggregating 70+ upstream engines. Zero cost, zero API keys, full data sovereignty.

Exa MCP Server

Official Exa MCP for fast web search and crawling when the workflow is search-first rather than page-ops-first.

Firecrawl MCP Server

Official Firecrawl MCP for scraping, extraction, and deep research workflows. 95K+ GitHub stars (main repo), 1.23M combined weekly downloads, backed by $14.5M Series A. ScrapeOps 10/10.

Public evidence

strong2024-05

HN: 194 pts 'ScrapeGraphAI: Web scraping using LLM and direct graph logic'

Strongest HN launch score in the search/scrape category. Strong developer interest in the LLM-graph extraction concept.

194 points, 63 comments on HNHN community

strong2026-03-15

v1.74.0 released Mar 15, 2026 — actively maintained

Active development with frequent versioned releases. 30+ contributors.

Regular releases, 30+ contributorsScrapeGraphAI maintainers

moderate2026-02

arXiv paper: ScrapeGraphAI-100K dataset (Feb 2026)

100K dataset for LLM-based web information extraction — academic validation of the approach.

arXiv preprintScrapeGraphAI research team

moderate2026-03

Star inflation flag: 1,580:1 stars-to-downloads ratio vs Firecrawl 81:1

23K stars vs 14.6K weekly downloads is anomalously high ratio. Firecrawl (94K stars, 752K/week) and Crawl4AI (62K stars, 372K/week) are both ~100:1. ScrapeGraphAI likely benefited from viral novelty rather than real production adoption.

23,033 stars, 14,611 weekly PyPI downloadsGitHub / PyPI metrics

strong2026

ScrapeOps: ScrapeGraphAI rated 10/10 — tied with Firecrawl

Tied with Firecrawl at 10/10 in independent review. 'Just write your prompt and go.' Schema-based extraction. $2,000/million pages (3x more expensive than Firecrawl).

Independent hands-on reviewScrapeOps (independent scraping platform)

moderate2026

AppSumo reviews: 9/10 user rating

Users praise accuracy and time-saving; some report UI/API inconsistencies. AppSumo promotion likely inflated star count.

Crowd user reviewsAppSumo users (crowd)

Raw GitHub source

GitHub README peek

Constrained peek so you can sanity-check the source material without leaving the site.

🚀 Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)? Check out our enhanced version at ScrapeGraphAI.com! 🚀

🕷️ ScrapeGraphAI: You Only Scrape Once

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).

Just say which information you want to extract and the library will do it for you!

🚀 Integrations

ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python or Node.js, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..

You can find more informations at the following link

Integrations:

API: Documentation
SDKs: Python, Node
LLM Frameworks: Langchain, Llama Index, Crew.ai, Agno, CamelAI
Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, Dify, Toolhouse

🚀 Quick install

The reference page for Scrapegraph-ai is available on the official page of PyPI: pypi.

pip install scrapegraphai

# IMPORTANT (for fetching websites content)
playwright install

Note: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

💻 Usage

There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).

The most common one is the SmartScraperGraph, which extracts information from a single page given a user prompt and a source URL.

from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "model_tokens": 8192,
        "format": "json",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
    source="https://scrapegraphai.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()

import json
print(json.dumps(result, indent=4))

[!NOTE] For OpenAI and other models you just need to change the llm config!
graph_config = {
   "llm": {
       "api_key": "YOUR_OPENAI_API_KEY",
       "model": "openai/gpt-4o-mini",
   },
   "verbose": True,
   "headless": False,
}

The output will be a dictionary like the following:

{
    "description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
    "founders": [
        {
            "name": "",
            "role": "Founder & Technical Lead",
            "linkedin": "https://www.linkedin.com/in/perinim/"
        },
        {
            "name": "Marco Vinciguerra",
            "role": "Founder & Software Engineer",
            "linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
        },
        {
            "name": "Lorenzo Padoan",
            "role": "Founder & Product Engineer",
            "linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
        }
    ],
    "social_media_links": {
        "linkedin": "https://www.linkedin.com/company/101881123",
        "twitter": "https://x.com/scrapegraphai",
        "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
    }
}

View on GitHub →