web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

✭ 237

typescript nodejs angular node mongodb proxy crawler spider puppeteer headless task-queue

Skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.

✭ 231

kotlin hacktoberfest testing crawler dom scraper parse test-automation integration-testing html-parser jsoup

Ecommercecrawlers

码云仓库链接:AJay13/ECommerceCrawlers Github 仓库链接:DropsDevopsOrg/ECommerceCrawlers 项目展示平台链接:http://wechat.doonsec.com

✭ 3,073

python CSS wechat crawler scrapy baidu boss lagou douban-movie baidu-tieba xianyu douban-music ctrip zhilianzhaopin sohu taobao-spider fofa dazhong-spider alitask baotu quanjing

Filesensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

✭ 227

python pentesting crawler fuzzing scrapy

Awesome Java Crawler

本仓库收集整理爬虫相关资源，开发语言以Java为主

✭ 228

java chrome crawler selenium jsoup

Annie

👾 Fast and simple video download library and CLI tool written in Go

✭ 16,369

go video crawler youtube downloader scraper bilibili qq tumblr download hacktoberfest youku iqiyi

Laravel Crawler Detect

A Laravel wrapper for CrawlerDetect - the web crawler detection library

✭ 227

laravel bot crawler spider detect

Selenops

A Swift Web Crawler 🕷

✭ 225

swift scripting web crawler command-line-tool

Arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

✭ 224

crawler seo scraping

Proxybroker

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS 🎭

✭ 2,767

python proxy privacy crawler http-proxy proxy-server socks anonymity proxies proxy-list proxypool anonymous proxy-checker

Ruiji.net

crawler framework, distributed crawler extractor

✭ 220

crawler scraper netcore scrapy headless-chrome

Chromium for spider

dynamic crawler for web vulnerability scanner

✭ 220

html security crawler spider puppeteer chromium

Pychromeless

Python Lambda Chrome Automation (naming pending)

✭ 219

python automation chrome crawler aws-lambda selenium chromium

Sitemap Generator Cli

Creates an XML-Sitemap by crawling a given site.

✭ 214

javascript cli google crawler seo sitemap

Jd mask robot

京东口罩库存监控爬虫(非selenium)，扫码登录、查价、加购、下单、秒杀

✭ 216

python python3 crawler spider

Webvideobot

Web crawler.

✭ 214

java crawler spider

Gorecon

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal

✭ 208

go crawler dns scanner recon subdomain-scanner

Tumblthree

A Tumblr Backup Application

✭ 211

csharp windows crawler mvvm wpf downloader backup internationalization tumblr

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

✭ 211

javascript nodejs docker parser browser crawler scraper parsing scraping phantomjs

Algoliasearch Netlify

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler

✭ 208

typescript search crawler netlify jamstack algolia

Media Scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

✭ 206

python crawler twitter instagram scraper reddit pixiv tumblr

Tianyancha

pip安装的天眼查爬虫API，指定的单个/多个企业工商信息一键保存为Excel/JSON格式。A Battery-included Scraper API of Tianyancha, the best Chinese business data and investigation platform.

✭ 206

python python3 data crawler pandas selenium scraper china business

Colly

Elegant Scraper and Crawler Framework for Golang

✭ 15,535

go HTML framework crawler spider scraper scraping crawling

Woid

Simple news aggregator displaying top stories in real time

✭ 204

python django crawler news

Jssoup

JavaScript + BeautifulSoup = JSSoup

✭ 203

javascript html nodejs react-native parser crawler spider beautifulsoup

Googlescraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.

✭ 2,363

python HTML crawler search-engine scraping search-engines search-engine-optimization

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

✭ 2,392

PHP HTML crawler spider scraper querylist

Zhihuspider

多线程知乎用户爬虫，基于python3

✭ 201

python python3 crawler spider zhihu multi-threading

Videoserver

以Node.js基于express以及爬虫实现的视频资源后端

✭ 200

javascript node video crawler

Laosj

golang light-weight image crawler

✭ 199

go image crawler downloader douban

Antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

✭ 198

go golang framework crawler scraping crawling web-crawler

Ok ip proxy pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

✭ 196

python python3 http proxy flask async crawler sqlite spider aiohttp ip pool proxypool

Fooproxy

稳健高效的评分制-针对性- IP代理池 + API服务，可以自己插入采集器进行代理IP的爬取，针对你的爬虫的一个或多个目标网站分别生成有效的IP代理数据库，支持MongoDB 4.0 使用 Python3.7（Scored IP proxy pool ,customise proxy data crawler can be added anytime）

✭ 195