Skyvern

active

Vision-LLM browser automation for enterprise workflows. Combines computer vision with LLM reasoning to handle websites never seen before. YC S23 backed with CAPTCHA solving, 2FA, and proxy networks.

Connector

Composite

Complexity

browserweb

80/100

Trust

21K+

Stars

Evidence

492.8 MB

Repo size

Videos

Reviews, tutorials, and comparisons from the community.

This Browser Agent Automates ANYTHING (N8N + Skyvern)

Ben AI·2025-02-11

Repo health

80/100

14h ago

Last push

151

Open issues

1,855

Forks

Contributors

Editorial verdict

Best pick for enterprise workflow automation on websites without APIs — form filling, data entry, procurement. Overkill for developer/coding agent browser tasks.

Source

GitHub: Skyvern-AI/skyvern

Docs: docs.skyvern.com

Public evidence

strong2024-03

Show HN: Skyvern — Browser automation using LLMs and computer vision — 422 points

Very high engagement on initial launch. Community validated the vision-LLM approach for browser automation.

422 HN points, 139 commentsHN community

strong2024-10

Launch HN: Skyvern (YC S23) — open-source AI agent for browser automations — 327 points

Second major HN appearance with sustained community interest. YC S23 batch backing adds institutional credibility.

327 HN points, 74 commentsHN community

moderate2026-03

WebVoyager benchmark: 85.85% (Steel.dev leaderboard)

Skyvern scored 85.85% on the WebVoyager benchmark. Solid but below Browser Use (89.1%). Validates the vision-LLM approach for enterprise automation.

Independent benchmark leaderboardSteel.dev (independent)

moderateSelf-reported2026-03

Skyvern-AI/skyvern: 20.8K stars — YC-backed vision-LLM browser agent

Now at v1.x (production-ready). AGPL-3.0 license. Active weekly releases. Enterprise-focused with CAPTCHA, 2FA, proxy support.

20,823 stars, 1,849 forks, 30+ contributorsOpen-source community

moderate2026-03

Automateed.com independent review: 7/10 rating

Balanced review: strengths in vision+LLM approach and natural language automation. Weaknesses in pricing opacity, steep learning curve, and AGPL license.

Independent review siteAutomateed.com (independent)

strong2026-03

rtrvr.ai benchmark: Skyvern 64.4% success vs Browser Use Cloud 43.9% — wins on reliability

Skyvern achieved 64.4% success rate vs Browser Use Cloud's 43.9%. Skyvern won on reliability while Browser Use won on speed (2x faster) and cost (2x cheaper per task).

Independent benchmark comparisonrtrvr.ai (independent)

How does this compare?

See side-by-side metrics against other skills in the same category.

COMPARE SKILLS →

Where it wins

Vision-LLM approach — handles websites never seen before, resilient to layout changes

Enterprise features: CAPTCHA solving, 2FA handling, proxy networks, geo-targeting

Multi-step workflow engine for complex business processes

YC S23 backed with $2.7M raised

Where to be skeptical

AGPL-3.0 license limits commercial use

Enterprise/RPA focus — overkill for coding agent browser tasks

Python-only

Pricing opacity noted by independent reviewers

Ranking in categories

Web Browsing / Browser Automation

#06of 9

Enterprise workflow automation on websites without APIs — form filling, procurement, data entry

Know a better alternative?

Submit evidence and we'll run the full pipeline.

SUBMIT →

Similar skills

Chrome DevTools MCP

Google Chrome team's official MCP server for Chrome DevTools. Gives coding agents deep debugging, performance profiling, and Core Web Vitals analysis through 26 tools across 6 categories.

Playwright MCP

Microsoft's official MCP server for Playwright. Uses accessibility snapshots instead of screenshots for structured browser control. Auto-configured in GitHub Copilot's Coding Agent.

Vercel Agent Browser

Token-efficient browser automation CLI for AI agents. Rust core with sub-50ms boot. Claims 93% context reduction vs Playwright MCP through ref-based element selection on accessibility snapshots.

Browser Use

Python library for controlling a real browser with vision and DOM extraction, built for agent workflows.

Raw GitHub source

GitHub README peek

Constrained peek so you can sanity-check the source material without leaving the site.

<h1 align="center"> <a href="https://www.skyvern.com"> <picture> <source media="(prefers-color-scheme: dark)" srcset="fern/images/skyvern_logo.png"/> <img height="120" src="https://raw.githubusercontent.com/Skyvern-AI/skyvern/main/fern/images/skyvern_logo_blackbg.png"/> </picture> </a> <br /> </h1> <p align="center"> 🐉 Automate Browser-based workflows using LLMs and Computer Vision 🐉 </p> <p align="center"> </p>

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a Playwright-compatible SDK that adds AI functionality on top of playwright, as well as a no-code workflow builder to help both technical and non-technical users automate manual workflows on any website, replacing brittle or unreliable automation solutions.

Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.

Instead of only relying on code-defined XPath interactions, Skyvern relies on Vision LLMs to learn and interact with the websites.

How it works

Skyvern was inspired by the Task-Driven autonomous agent design popularized by BabyAGI and AutoGPT -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like Playwright.

Skyvern uses a swarm of agents to comprehend a website, and plan and execute its actions:

This approach has a few advantages:

Skyvern can operate on websites it's never seen before, as it's able to map visual elements to actions necessary to complete a workflow, without any customized code
Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
Skyvern is able to take a single workflow and apply it to a large number of websites, as it's able to reason through the interactions necessary to complete the workflow A detailed technical report can be found here.

Demo

https://github.com/user-attachments/assets/5cab4668-e8e2-4982-8551-aab05ff73a7f

Quickstart

Skyvern Cloud

Skyvern Cloud is a managed cloud version of Skyvern that allows you to run Skyvern without worrying about the infrastructure. It allows you to run multiple Skyvern instances in parallel and comes bundled with anti-bot detection mechanisms, proxy network, and CAPTCHA solvers.

If you'd like to try it out, navigate to app.skyvern.com and create an account.

Run Locally (UI + Server)

Choose your preferred setup method:

Option A: pip install (Recommended)

Dependencies needed:

Python 3.11.x, works with 3.12, not ready yet for 3.13
NodeJS & NPM

Additionally, for Windows:

Rust
VS Code with C++ dev tools and Windows SDK

1. Install Skyvern

pip install skyvern

2. Run Skyvern

skyvern quickstart

Option B: Docker Compose

Install Docker Desktop

Clone the repository:

git clone https://github.com/skyvern-ai/skyvern.git && cd skyvern

Run quickstart with Docker Compose:
```
pip install skyvern && skyvern quickstart
```
When prompted, choose "Docker Compose" for the full containerized setup.
Navigate to http://localhost:8080

SDK

Skyvern is a Playwright extension that adds AI-powered browser automation. It gives you the full power of Playwright with additional AI capabilities—use natural language prompts to interact with elements, extract data, and automate complex multi-step workflows.

Installation:

Python: pip install skyvern then run skyvern quickstart for local setup
TypeScript: npm install @skyvern/client

AI-Powered Page Commands

Skyvern adds four core AI commands directly on the page object:

Command	Description
`page.act(prompt)`	Perform actions using natural language (e.g., "Click the login button")
`page.extract(prompt, schema)`	Extract structured data from the page with optional JSON schema
`page.validate(prompt)`	Validate page state, returns `bool` (e.g., "Check if user is logged in")
`page.prompt(prompt, schema)`	Send arbitrary prompts to the LLM with optional response schema

Additionally, page.agent provides higher-level workflow commands:

Command	Description
`page.agent.run_task(prompt)`	Execute complex multi-step tasks
`page.agent.login(credential_type, credential_id)`	Authenticate with stored credentials (Skyvern, Bitwarden, 1Password)
`page.agent.download_files(prompt)`	Navigate and download files
`page.agent.run_workflow(workflow_id)`	Execute pre-built workflows

AI-Augmented Playwright Actions

All standard Playwright actions support an optional prompt parameter for AI-powered element location:

Action	Playwright	AI-Augmented
Click	`page.click("#btn")`	`page.click(prompt="Click login button")`
Fill	`page.fill("#email", "a@b.com")`	`page.fill(prompt="Email field", value="a@b.com")`
Select	`page.select_option("#country", "US")`	`page.select_option(prompt="Country dropdown", value="US")`
Upload	`page.upload_file("#file", "doc.pdf")`	`page.upload_file(prompt="Upload area", files="doc.pdf")`

Three interaction modes:

# 1. Traditional Playwright - CSS/XPath selectors
await page.click("#submit-button")

# 2. AI-powered - natural language

View on GitHub →