All Categories → Data Processing → crawler

Top 615 crawler open source projects

Anticrawlersolution

It covers the blockade principle of most anti-climbing strategies and corresponding solutions.👽👽👽👽（涵盖了大部分的反爬策略的封锁原理以及对应的解决方案。）

✭ 77

machine-learning algorithm crawler

Crawler examples

Some classic web crawler projects.一些经典的爬虫

✭ 74

python web crawler spider

Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu

✭ 73

python crawler data-mining

Golang pkg to quickly return a preview of a webpage (title/description/images)

✭ 72

go golang image crawler website engine scraper url icon preview page webpage

Python爬虫，京东自动登录，在线抢购商品

✭ 1,174

python crawler scraper

Scrapy Examples

Some scrapy and web.py exmaples

✭ 71

python crawler scrapy

python crawler spider

✭ 70

python crawler spider

Powerful web scraping framework for Crystal

✭ 68

crystal bot crawler spider web-scraping crawling web-scraper

Python Testing Crawler

A crawler for automated functional testing of a web application

✭ 68

python django testing flask crawler

优雅地玩知乎

✭ 67

python crawler zhihu

Tracker Radar Collector

🕸 Modular, multithreaded, puppeteer-based crawler

✭ 67

javascript crawler puppeteer

爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

✭ 60

python wechat crawler weixin weibo douban taobao

Terpene Profile Parser For Cannabis Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

✭ 63

python database data-science crawler bioinformatics analysis scrapy health web-crawler

Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片，视频

✭ 1,118

python crawler photos videos tumblr

hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)

✭ 62

python proxy crawler asyncio schedule sanic

When you solve the problem of Baekjoon Online Judge, it automatically commits and pushes to the remote repository.

✭ 60

python nodejs git algorithm crawler phantomjs

A document viewer; fuzzy match incremental search.

✭ 59

javascript electron crawler

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman。

✭ 1,096

Auto Lighthouse

A utility package for automating lighthouse reporting

✭ 58

A powerful dynamic crawler for web vulnerability scanners

✭ 1,088

golang crawler chromium headless headless-chrome chrome-devtools vulnerability-scanner

Golang爬虫爬取汽车之家二手车产品库

✭ 57

go golang crawler spider

Awesome Python Primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

✭ 57

python python3 awesome awesome-list django web flask book crawler learning spider scraping learn

Picacomic downloader

哔咔漫画收藏夹下载程序

✭ 57

Leetcode Ranking Search

Leetcode Contest Ranking Searcher

✭ 51

javascript python html vue vuejs crawler leetcode

Images Web Crawler

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..

✭ 51

python machine-learning image-processing dataset crawler image-classification images

Get the lyrics for the song currently playing on Spotify

✭ 49

python python3 crawler spotify lyrics

基于NodeJS的基金数据爬虫，爬取的数据存于github的@nullpointer/fund-data。

✭ 46

javascript crawler travis-ci

Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt

✭ 47

python crawler youtube instagram scraper requests

A Strong, Fast and Flexible Pixiv Client based on .NET Core and WPF

✭ 1,031

csharp api windows crawler wpf desktop-app desktop-application pixiv

新浪微博爬虫，用python爬取新浪微博数据，并下载微博图片和微博视频

✭ 1,019

python crawler weibo

Incredibly fast crawler designed for OSINT.

✭ 8,332

python Dockerfile crawler spider osint information-gathering

AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

✭ 8,133

PHP Blade laravel database crawler spider scraper magnet magnet-link guzzlehttp adult javbus javlibrary avmoo adult-video

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

✭ 8,392

go Dockerfile Makefile shell docker crawler spider scrapy platform web-crawler webcrawler scrapyd-ui webspider crawling-tasks crawlab spiders-management

vulnx 🕷️ is an intelligent bot auto shell injector that detect vulnerabilities in multiple types of cms { `wordpress , joomla , drupal , prestashop .. `}

✭ 1,009

python bot hacking security-tools crawler pentest hacking-tool vulnerability exploitation information-gathering vulnerability-detection vulnerability-assessment

💐 Full Amazon Automatic Download

✭ 41

go golang crawler spider distributed amazon

Deepweb Scappering

Discover hidden deepweb pages

✭ 40

python python3 hacking crawler hacking-tool tor internet kali tor-network

Rust Web Crawler saving pages on Redis

✭ 39

rust web http crawler spider web-crawler

🔍 简单的搜索引擎, django 框架

✭ 39

html django crawler search-engine

Pixivcrawleriii

A python3 crawler for crawling Pixiv ranking top and any illustrator all artworks

✭ 38

python crawler crypto multithreading pixiv

Find web directories without bruteforce

✭ 983

python security security-tools pentesting crawler

A GUI client of schannel powered by therecipe/qt and golang

✭ 36

go golang linux crawler qt5 client-side

The fast website crawler

✭ 35

go golang command-line crawler

那些年，我爬过的北科。一个由浅入深的定向爬虫教程。

✭ 35

python crawler tutorials

File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch

✭ 977

python elasticsearch crawler storage filesystem metadata botnet disk-space

Web Crawler written in C#

✭ 34

news-please - an integrated web crawler and information extractor for news that just works.

✭ 969

python json nlp elasticsearch crawler news

[DEPRECATED] Simple, flexible, delightful web crawler/spider package

✭ 33

typescript web node async crawler spider promise pipeline crawl

抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢

✭ 33

java crawler vertx

Leboncoin Crawler

Crawler for leboncoin.fr

✭ 32

🐞简单轻便的Java爬虫框架，只要会一点简单的正则表达式和简单的css选择器就能轻松的采集数据。

✭ 32

Google, Naver multiprocess image web crawler (Selenium)

✭ 957

python deep-learning google crawler selenium bigdata customizable thread chromedriver

Universityrecruitment Ssurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”？

✭ 30

python redis data crawler analysis university beautifulsoup

头条号爬虫案例

✭ 30

Crawler used to crawl papers

✭ 20

python crawler paper cvpr

Scrapy Azuresearch Crawler Samples

Scrapy as a Web Crawler for Azure Search Samples

✭ 20

python python3 search crawler azure scrapy

Tor website crawler (specific for Alphabay at the time)

✭ 15

python parser crawler tor onion

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

✭ 14

python pdf crawler scraper

Crawl websites for accessibility issues from the command line.

✭ 12

coffeescript electron command-line crawler csv accessibility chromium headless a11y

Sina Stock Crawler

Sina stock options crawler with CSV output 新浪上证ETF期权数据爬虫

✭ 12

python python3 crawler stock stock-market stocks sina

Simple CORPORA list crawler

✭ 11

181-240 of 615 crawler projects