All Projects → lrlna → Puppeteer Walker

lrlna / Puppeteer Walker

Licence: apache-2.0
a puppeteer walker 🕷 🕸

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Puppeteer Walker

Ppspider
web spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (+203.85%)
Mutual labels:  crawler, spider, puppeteer, headless
Squidwarc
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (+60.26%)
Mutual labels:  crawler, chrome, puppeteer
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+119.23%)
Mutual labels:  crawler, spider, puppeteer
Chromium for spider
dynamic crawler for web vulnerability scanner
Stars: ✭ 220 (+182.05%)
Mutual labels:  crawler, spider, puppeteer
Serverless Puppeteer Layers
Serverless Framework + AWS Lambda Layers + Puppeteer = ❤️
Stars: ✭ 247 (+216.67%)
Mutual labels:  chrome, puppeteer, headless
Sms Boom
利用chrome的headless模式,模拟用户注册进行短信轰炸机
Stars: ✭ 507 (+550%)
Mutual labels:  chrome, puppeteer, headless
Jvppeteer
Headless Chrome For Java (Java 爬虫)
Stars: ✭ 193 (+147.44%)
Mutual labels:  crawler, chrome, puppeteer
Puppeteer Api Zh cn
📖 Puppeteer中文文档(官方指定的中文文档)
Stars: ✭ 697 (+793.59%)
Mutual labels:  chrome, puppeteer, headless
Puppetron
Puppeteer (Headless Chrome Node API)-based rendering solution.
Stars: ✭ 429 (+450%)
Mutual labels:  chrome, puppeteer, headless
Webster
a reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+366.67%)
Mutual labels:  crawler, spider, puppeteer
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+6475.64%)
Mutual labels:  crawler, chrome, puppeteer
Url To Pdf Api
Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
Stars: ✭ 6,544 (+8289.74%)
Mutual labels:  chrome, puppeteer, headless
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+10658.97%)
Mutual labels:  crawler, spider
Ferrum
Headless Chrome Ruby API
Stars: ✭ 1,009 (+1193.59%)
Mutual labels:  chrome, headless
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (+10326.92%)
Mutual labels:  crawler, spider
Puppeteer Deep
Puppeteer, Headless Chrome;爬取《es6标准入门》、自动推文到掘金、站点性能分析;高级爬虫、自动化UI测试、性能分析;
Stars: ✭ 1,033 (+1224.36%)
Mutual labels:  chrome, puppeteer
Lizard
💐 Full Amazon Automatic Download
Stars: ✭ 41 (-47.44%)
Mutual labels:  crawler, spider
Photon
Incredibly fast crawler designed for OSINT.
Stars: ✭ 8,332 (+10582.05%)
Mutual labels:  crawler, spider
Daily Signin
网站签到脚本
Stars: ✭ 52 (-33.33%)
Mutual labels:  puppeteer, headless
Car Prices
Golang爬虫 爬取汽车之家 二手车产品库
Stars: ✭ 57 (-26.92%)
Mutual labels:  crawler, spider

puppeteer-walker

npm version build status downloads js-standard-style

A crawler to go through your given site in a headless chrome using puppeteer. Returns an object containing host, current path, and current DOM object

Usage

var Walker = require('puppeteer-walker')

var walker = Walker()

walker.on('end', () => console.log('finished walking'))
walker.on('error', (err) => console.log('error', err))
walker.on('page', async (page) => {
  var title = await page.title()
  console.log(`title: ${title}`)
})

walker.walk('https://avocado.choo.io')

API

walker = PuppeteerWalker()

Create a new walker instance.

walker.on('page', async cb(Page, push))

Listen to a page event. Returns an instance of the puppeteer Page Class. The callback has to be an Async Function.

Use the push(url) method to add more pages into the internal walker queue. This is useful for busting past login forms, and the like.

walker.on('error', cb(err))

Listen to error events.

walker.on('end', cb)

Listen to an end event.

walker.walk(url)

Start walking the URL.

See Also

License

Apache-2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].