skillpack.co
All solutions

ScrapeGraphAI

active

LLM-graph-based web scraper — describe what you want, AI builds the extraction graph. 23K stars, 194 HN pts, active development (v1.74.0, Mar 2026). Open-source + hosted API.

Score 82

Where it wins

Unique LLM graph pipeline — describe extraction goal, AI builds the graph

23,033 GitHub stars

194 HN pts at launch — strongest HN engagement in this category

Active development: v1.75.0 released Mar 18, 2026 (2 releases in 3 days)

arXiv paper Feb 2026 — academic credibility

Apache-2.0 (OSS) + hosted API dual model

Where to be skeptical

14,611 weekly PyPI downloads — star inflation (1,580:1 stars-to-downloads vs Firecrawl 81:1)

~$85/mo Growth plan for 10K structured extractions — expensive vs free Crawl4AI

No benchmark placement in AIMultiple 2026

Lower real adoption than star count suggests

Editorial verdict

Below cut line in search-news. 23K stars, 194 HN pts (strongest HN score in category), unique LLM graph pipeline approach. But only 14.6K weekly PyPI downloads vs Firecrawl's 752K — star count likely inflated by viral novelty. Stars-to-downloads ratio 1,580:1 vs Firecrawl's 81:1.

Videos

Reviews, tutorials, and comparisons from the community.

Game-Changing AI Web Scraping: ScrapeGraphAI Tutorial & Founder Secrets

Made By Agents·2025-03-15

Related

Public evidence

moderate2026-03
Star inflation flag: 1,580:1 stars-to-downloads ratio vs Firecrawl 81:1

23K stars vs 14.6K weekly downloads is anomalously high ratio. Firecrawl (94K stars, 752K/week) and Crawl4AI (62K stars, 372K/week) are both ~100:1. ScrapeGraphAI likely benefited from viral novelty rather than real production adoption.

23,033 stars, 14,611 weekly PyPI downloadsGitHub / PyPI metrics
strong2026
ScrapeOps: ScrapeGraphAI rated 10/10 — tied with Firecrawl

Tied with Firecrawl at 10/10 in independent review. 'Just write your prompt and go.' Schema-based extraction. $2,000/million pages (3x more expensive than Firecrawl).

Independent hands-on reviewScrapeOps (independent scraping platform)
moderate2026
AppSumo reviews: 9/10 user rating

Users praise accuracy and time-saving; some report UI/API inconsistencies. AppSumo promotion likely inflated star count.

Crowd user reviewsAppSumo users (crowd)

Raw GitHub source

GitHub README peek

Constrained peek so you can sanity-check the source material without leaving the site.

🚀 Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)? Check out our enhanced version at ScrapeGraphAI.com! 🚀


🕷️ ScrapeGraphAI: You Only Scrape Once

English | 中文 | 日本語 | 한국어 | Русский | Türkçe | Deutsch | Español | français | Português

API Banner

<p align="center"> <a href="https://trendshift.io/repositories/9761" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9761" alt="VinciGit00%2FScrapegraph-ai | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <p align="center">

ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).

Just say which information you want to extract and the library will do it for you!

<p align="center"> <img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/sgai-hero.png" alt="ScrapeGraphAI Hero" style="width: 100%;"> </p>

🚀 Integrations

ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python or Node.js, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..

You can find more informations at the following link

Integrations:

  • API: Documentation
  • SDKs: Python, Node
  • LLM Frameworks: Langchain, Llama Index, Crew.ai, Agno, CamelAI
  • Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, Dify, Toolhouse

🚀 Quick install

The reference page for Scrapegraph-ai is available on the official page of PyPI: pypi.

pip install scrapegraphai

# IMPORTANT (for fetching websites content)
playwright install

Note: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱

💻 Usage

There are multiple standard scraping pipelines that can be used to extract information from a website (or local file).

The most common one is the SmartScraperGraph, which extracts information from a single page given a user prompt and a source URL.

from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "model_tokens": 8192,
        "format": "json",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Extract useful information from the webpage, including a description of what the company does, founders and social media links",
    source="https://scrapegraphai.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()

import json
print(json.dumps(result, indent=4))

[!NOTE] For OpenAI and other models you just need to change the llm config!

graph_config = {
   "llm": {
       "api_key": "YOUR_OPENAI_API_KEY",
       "model": "openai/gpt-4o-mini",
   },
   "verbose": True,
   "headless": False,
}

The output will be a dictionary like the following:

{
    "description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics. It offers an AI-powered API for effortless and cost-effective data extraction.",
    "founders": [
        {
            "name": "",
            "role": "Founder & Technical Lead",
            "linkedin": "https://www.linkedin.com/in/perinim/"
        },
        {
            "name": "Marco Vinciguerra",
            "role": "Founder & Software Engineer",
            "linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/"
        },
        {
            "name": "Lorenzo Padoan",
            "role": "Founder & Product Engineer",
            "linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/"
        }
    ],
    "social_media_links": {
        "linkedin": "https://www.linkedin.com/company/101881123",
        "twitter": "https://x.com/scrapegraphai",
        "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
    }
}

There are other pipelines that can be used to extract information from multiple pages, generate Python scripts, or even generate audio files.

Pipeline NameDescription
SmartScraperGraphSingle-page scraper that only needs a user prompt and an input source.
SearchGraphMulti-page scraper that extracts information from the top n search results of a search engine.
SpeechGraphSingle-page scraper that extracts information from a website and generates an audio file.
View on GitHub →