All Categories → Data Processing → crawler

Top 615 crawler open source projects

Tiebamanager

（已跑路）百度贴吧吧务管理工具，自动扫描帖子并处理违规帖

✭ 119

crawler

Php Crawler

A php crawler that finds emails on the internets

✭ 119

vue laravel vuejs crawler webscraping

Free proxy website

获取免费socks/https/http代理的网站集合

✭ 119

python proxy crawler spider ip

Sentinel Crawler

Xenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure 💫 多语言执行器，分布式爬虫

✭ 118

javascript react nodejs crawler monitor koa2 etl

Docs

《数据采集从入门到放弃》源码。内容简介：爬虫介绍、就业情况、爬虫工程师面试题；HTTP协议介绍； Requests使用；解析器Xpath介绍； MongoDB与MySQL；多线程爬虫； Scrapy介绍；Scrapy-redis介绍；使用docker部署；使用nomad管理docker集群；使用EFK查询docker日志

✭ 118

python docker mysql http mongodb crawler scrapy requests xpath

Moodle Downloader 2

A Moodle downloader that downloads course content fast from Moodle (eg. lecture pdfs)

✭ 118

python bot pdf crawler telegram downloader fast xmpp assets content

Decryptlogin

APIs for loginning some websites by using requests.

✭ 1,861

python crawler twitter spider pypi login requests bilibili xiaomi baidu weibo zhihu stackoverflow taobao tencent baiduyun xiami 12306 jingdong migu

Examples Of Web Crawlers

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

✭ 10,724

python HTML javascript CSS wechat crawler spider example selenium multithreading stock taobao pyquery tmall fund agent-pool wechat-report

Baiducrawler

Sample of using proxies to crawl baidu search results.

✭ 116

python proxy crawler baidu proxies

Prerender Java

java framework for prerender

✭ 115

java crawler seo prerender

Memex Explorer

Viewers for statistics and dashboarding of Domain Search Engine data

✭ 115

python dashboard crawler apache anaconda

Bilibili member crawler

B站用户爬虫好耶~是爬虫

✭ 115

python python3 mysql web crawler spider queue requests bilibili multithreading

Jianso movie

🎬 电影资源爬虫,电影图片抓取脚本,Flask|Nginx|wsgi

✭ 114

python flask crawler sqlite

Patentcrawler

scrapy专利爬虫（停止维护）

✭ 114

python visualization data crawler scrapy echarts

Douban Movie

Golang爬虫爬取豆瓣电影Top250

✭ 114

go golang crawler spider movie douban

Pkulaw spider

爬取北大法宝网http://www.pkulaw.cn/Case/

✭ 113

python ai crawler spider law

Lcrawl

一只优雅的正方教务系统爬虫。

✭ 112

crawler

Graphquery

GraphQuery is a query language and execution engine tied to any backend service.

✭ 112

go html css sql graph crawler xml query xpath regexp

Baiduspider

BaiduSpider，一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

✭ 105

python search crawler spider baidu

Google Play Scraper

Node.js scraper to get data from Google Play

✭ 1,606

javascript nodejs api crawler scraper google-play

Instagram Profilecrawl

💻 Quickly crawl the information (e.g. followers, tags, etc...) of an instagram profile. No login required!

✭ 110

javascript script nodejs automation browser crawler instagram selenium chromedriver

Pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 404 errors) encountered.

✭ 109

python networking crawler

Linkcrawler

Cross-platform persistent and distributed web crawler 🔗

✭ 109

go web crawler

Lumberjack

An automated website accessibility scanner and cli

✭ 109

javascript cli crawler accessibility a11y

Fawkes

Fawkes is a tool to search for targets vulnerable to SQL Injection. Performs the search using Google search engine.

✭ 108

python security hacking google crawler sql-injection

Scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

✭ 42,343

python hacktoberfest framework crawler scraping crawling

Webmagic

A scalable web crawler framework for Java.

✭ 10,186

java HTML javascript kotlin ruby groovy framework crawler scraping

Not Your Average Web Crawler

A web crawler (for bug hunting) that gathers more than you can imagine.

✭ 107

python security crawler spider scanner scraper vulnerability custom request bug-bounty

Crawler Detect

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

✭ 1,549

PHP hacktoberfest crawler spider bots user-agent detect

Crawler

爬虫, http代理, 模拟登陆!

✭ 106

python crawler scrapy

D4n155

OWASP D4N155 - Intelligent and dynamic wordlist using OSINT

✭ 105

shell tool google crawler osint dynamic scraping wordlist duckduckgo

Andvaranaut

A dungeon crawler

✭ 103

c crawler crawl

Dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

✭ 100

csharp crawler dotnetcore scrapy scraping entity-framework-core webscraping ddd-architecture crawling

Ruia

Async Python 3.6+ web scraping micro-framework based on asyncio

✭ 1,366

python crawler asyncio spider aiohttp

Crawlerpack

Java 網路資料爬蟲包

✭ 99

java html json crawler xml jsoup

Antispider

✭ 99

javascript python crawler

Douyinsdk

抖音 SDK，数据采集，爬虫抓取不是梦

✭ 99

python sdk crawler spider

Gopa Abandoned

GOPA, a spider written in Go.（NOTE: this project moved to https://github.com/infinitbyte/gopa ）

✭ 98

go golang crawler spider lightweight

Thesaurusspider

下载搜狗、百度、QQ输入法的词库文件的 python 爬虫，可用于构建不同行业的词汇库

✭ 98

python crawler multithreading

Amazonrobot

Amazon商品引流的 python 爬虫

✭ 97

python crawler selenium amazon

Infinitycrawler

A simple but powerful web crawler library for .NET

✭ 97

hacktoberfest crawler web-crawler

Scaleable Crawler With Docker Cluster

a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine

✭ 96

python docker crawler rabbitmq distributed cluster celery

Lightcrawler

Crawl a website and run it through Google lighthouse

✭ 1,339

javascript chrome crawler

Gf Secrets

Secret and/ credential patterns used for gf.

✭ 96

shell crawler infosec bugbounty

Scrapoxy

Scrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!

✭ 1,322

javascript nodejs proxy cloud crawler angularjs scraper scrapy

Hotnewsanalysis

利用文本挖掘技术进行新闻热点关注问题分析

✭ 93

python crawler news word2vec

Ktspeechcrawler

Automatically constructing corpus for automatic speech recognition from YouTube videos

✭ 92

python crawler youtube speech-recognition asr

Proxy Pool

爬虫代理IP池服务，可供其他爬虫程序通过restapi获取

✭ 91

java crawler proxypool

Weibo Album Crawler

新浪微博相册大图多线程爬虫。

✭ 83

python python3 crawler requests weibo futures concurrent

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

✭ 1,246

go crawler spider scraper scraping

Tumblr crawler

tumblr解析网站

✭ 83

python crawler tumblr

Taiwan News Crawlers

Scrapy-based Crawlers for news of Taiwan

✭ 83

python crawler scrapy news taiwan

Acm Statistics

An online tool (crawler) to analyze users performance in online judges (coding competition websites). Supported OJ: POJ, HDU, ZOJ, HYSBZ, CodeForces, UVA, ICPC Live Archive, FZU, SPOJ, Timus (URAL), LeetCode_CN, CSU, LibreOJ, 洛谷, 牛客OJ, Lutece (UESTC), AtCoder, AIZU, CodeChef, El Judge, BNUOJ, Codewars, UOJ, NBUT, 51Nod, DMOJ, VJudge

✭ 83

javascript csharp vue nodejs docker crawler acm-icpc

Is Google

Verify that a request is from Google crawlers using Google's DNS verification steps

✭ 82

javascript js nodejs bot google crawler dns ip verify

Work crawler

Download comics novels 小说漫画下载工具小説漫画のダウンローダ小說漫畫下載:腾讯漫画大角虫漫画有妖气知音漫客咪咕 SF漫画哦漫画看漫画漫画柜汗汗酷漫動漫伊甸園快看漫画微博动漫 733动漫网大古漫画网漫画DB 無限動漫動漫狂卡推漫画动漫之家动漫屋古风漫画网 36漫画网亲亲漫画网乙女漫画 comico webtoons 咚漫ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミックサイコミ;アルファポリスカクヨムハーメルン小説家になろう起点中文网八一中文网顶点小说落霞小说网努努书坊笔趣阁→epub.

✭ 1,224

javascript crawler downloader ebook epub manga comics

Wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

✭ 1,220

ruby dsl crawler scraper

Swiftlinkpreview

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.

✭ 1,216