Top 615 crawler open source projects

s3recon
Amazon S3 bucket finder and crawler.
hupu spider
虎扑步行街爬虫
baidu-chain-dog
百度莱茨狗爬虫。
scrapy-kafka-redis
Distributed crawling/scraping, Kafka And Redis based components for Scrapy
instastories-backup
Backup your friends' Instagram Stories forever and get to keep them even after 24 hours.
bthello
Python3 DHT 磁力种子爬虫 种子解析 种子搜索 演示地址
DeadPool
该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作,所有组件都能够原生支持规模并发和分布式,加上celery原生的分布式调用,实现大规模并发。
doc crawler.py
Explore a website recursively and download all the wanted documents (PDF, ODT…)
urlbuster
Powerful mutable web directory fuzzer to bruteforce existing and/or hidden files or directories.
PTTmineR
Parallel Searching and Crawling Data from PTT 🚀
tiktok-crawler
This is a Tiktok Crawler App.
Broken-Link-Crawler
🤖 Python bot that crawls your website looking for dead stuff
ArticleSpider
Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
advanced-php-crawler
新浪博客文章/wenku8轻小说文库爬虫,可抓取图片保存,一键制作电子书。kindle读书党的神器!
Spider
💫 Spider is a PHP library with easily module integration for crawling website that allows you to scrape informations.
INMET-API-temperature
Crawler dos dados metereológicos de estações convencionais do INMET (BDMEP)
All-IT-eBooks-Spider
[Updated] A simple python crawler for my tutorial blog at http://www.jianshu.com/p/8fb5bc33c78e
iranian-calendar-events
Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website
php-crawler
🕷️ A simple crawler (spider) writen in php just for fun, with zero dependencies
crawl
Lightweight library for scalable crawlers in Go.
crawler
nodejs 爬虫框架. crawler framework for nodejs
DouyuBarrage-Pro
(2020年最新)斗鱼弹幕抓取及可视化管理平台第二版,提供弹幕抓取、弹幕实时发送速度可视化、抓取记录查询、弹幕下载、自定义关键词统计、铁粉统计、高光时刻自动捕获、高频弹幕词云等功能,起飞~~~
vietnam-ecommerce-crawler
Crawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs
asyncpy
使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
grapy
Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
googleplay api
Google Play Unofficial Python 3 API Library
netease-music-cracker
🎵 缓存文件转换为 MP3 文件
Web-Iota
Iota is a web scraper which can find all of the images and links/suburls on a webpage
web-crawler
Python Web Crawler with Selenium and PhantomJS
frisbee
Collect email addresses by crawling search engine results.
crawlzone
Crawlzone is a fast asynchronous internet crawling framework for PHP.
findmeaflat
Get notified of new listings on popular German real estate portals.
scrapy helper
Dynamic configurable crawl (动态可配置化爬虫)
nasty
NASTY Advanced Search Tweet Yielder
crawler
Nodejs crawler for cnbeta.com
lopez
Crawling and scraping the Web for fun and profit
actor-youtube-scraper
Apify actor to scrape Youtube search results. You can set the maximum videos to scrape per page as well as the date from which to start scraping.
crawler-client
crawler dev tools using electron webview
NEEA-TOEFL-Testseat-Crawler
托福考位爬虫 NEEA TOEFL Testseat Crawler
jd-autobuy
Python爬虫,京东自动登录,在线抢购商品
qr-pirate
crawl QR-codes from search engines and look for bitcoin private keys
CrawlerSamples
This is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.
541-600 of 615 crawler projects