Top 615 crawler open source projects

Anticrawlersolution
It covers the blockade principle of most anti-climbing strategies and corresponding solutions.👽👽👽👽(涵盖了大部分的反爬策略的封锁原理以及对应的解决方案。)
Crawler examples
Some classic web crawler projects.一些经典的爬虫
Bee University
Project thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu
Goscraper
Golang pkg to quickly return a preview of a webpage (title/description/images)
Jd Autobuy
Python爬虫,京东自动登录,在线抢购商品
Scrapy Examples
Some scrapy and web.py exmaples
Spider
python crawler spider
Arachnid
Powerful web scraping framework for Crystal
Python Testing Crawler
A crawler for automated functional testing of a web application
Zhihuvapi
优雅地玩知乎
Tracker Radar Collector
🕸 Modular, multithreaded, puppeteer-based crawler
Lxspider
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》
Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Tumblr Crawler
Easily download all the photos/videos from tumblr blogs. 下载指定的 Tumblr 博客中的图片,视频
Hproxy
hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)
Boj Autocommit
When you solve the problem of Baekjoon Online Judge, it automatically commits and pushes to the remote repository.
Chemrtron
A document viewer; fuzzy match incremental search.
Beanbun
Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman。
✭ 1,096
crawlerspider
Auto Lighthouse
A utility package for automating lighthouse reporting
Crawlergo
A powerful dynamic crawler for web vulnerability scanners
Car Prices
Golang爬虫 爬取汽车之家 二手车产品库
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Picacomic downloader
哔咔漫画收藏夹下载程序
Images Web Crawler
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Lyrics Crawler
Get the lyrics for the song currently playing on Spotify
Fund Crawler
基于NodeJS的基金数据爬虫,爬取的数据存于github的@nullpointer/fund-data。
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Pixeval
A Strong, Fast and Flexible Pixiv Client based on .NET Core and WPF
Weibo Crawler
新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
Photon
Incredibly fast crawler designed for OSINT.
Avbook
AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Vulnx
vulnx 🕷️ is an intelligent bot auto shell injector that detect vulnerabilities in multiple types of cms { `wordpress , joomla , drupal , prestashop .. `}
Lizard
💐 Full Amazon Automatic Download
Maman
Rust Web Crawler saving pages on Redis
Dbworld Search
🔍 简单的搜索引擎, django 框架
Pixivcrawleriii
A python3 crawler for crawling Pixiv ranking top and any illustrator all artworks
Dirhunt
Find web directories without bruteforce
Schannel Qt5
A GUI client of schannel powered by therecipe/qt and golang
Gargantua
The fast website crawler
Ustbcrawlers
那些年,我爬过的北科。一个由浅入深的定向爬虫教程。
Diskover
File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
Ncrawler
Web Crawler written in C#
✭ 34
crawler
News Please
news-please - an integrated web crawler and information extractor for news that just works.
Nodespider
[DEPRECATED] Simple, flexible, delightful web crawler/spider package
Douyin Crawler
抖音爬虫. 通过手机代理爬取用户的作品和用户的喜欢
Leboncoin Crawler
Crawler for leboncoin.fr
Vw Crawler
🐞简单轻便的Java爬虫框架,只要会一点简单的正则表达式和简单的css选择器就能轻松的采集数据。
Autocrawler
Google, Naver multiprocess image web crawler (Selenium)
Universityrecruitment Ssurvey
用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?
Toutiaocrawler
头条号爬虫案例
Papercrawler
Crawler used to crawl papers
Scrapy Azuresearch Crawler Samples
Scrapy as a Web Crawler for Azure Search Samples
Onion Crawler
Tor website crawler (specific for Alphabay at the time)
Pypergrabber
Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.
Axegrinder
Crawl websites for accessibility issues from the command line.
Sina Stock Crawler
Sina stock options crawler with CSV output 新浪上证ETF期权数据爬虫
Ccrawl
Simple CORPORA list crawler