s3reconAmazon S3 bucket finder and crawler.
robots.txt🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
scrapy-kafka-redisDistributed crawling/scraping, Kafka And Redis based components for Scrapy
instastories-backupBackup your friends' Instagram Stories forever and get to keep them even after 24 hours.
ungoliant🕷️ The pipeline for the OSCAR corpus
bthelloPython3 DHT 磁力种子爬虫 种子解析 种子搜索 演示地址
DeadPool该项目是一个使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作,所有组件都能够原生支持规模并发和分布式,加上celery原生的分布式调用,实现大规模并发。
doc crawler.pyExplore a website recursively and download all the wanted documents (PDF, ODT…)
urlbusterPowerful mutable web directory fuzzer to bruteforce existing and/or hidden files or directories.
PTTmineRParallel Searching and Crawling Data from PTT 🚀
ArticleSpiderCrawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Spider💫 Spider is a PHP library with easily module integration for crawling website that allows you to scrape informations.
All-IT-eBooks-Spider[Updated] A simple python crawler for my tutorial blog at http://www.jianshu.com/p/8fb5bc33c78e
php-crawler🕷️ A simple crawler (spider) writen in php just for fun, with zero dependencies
crawlLightweight library for scalable crawlers in Go.
crawlernodejs 爬虫框架. crawler framework for nodejs
DouyuBarrage-Pro(2020年最新)斗鱼弹幕抓取及可视化管理平台第二版,提供弹幕抓取、弹幕实时发送速度可视化、抓取记录查询、弹幕下载、自定义关键词统计、铁粉统计、高光时刻自动捕获、高频弹幕词云等功能,起飞~~~
asyncpy使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
grapyGrapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.
Web-IotaIota is a web scraper which can find all of the images and links/suburls on a webpage
spongesponge is a website crawler and links downloader command-line tool
web-crawlerPython Web Crawler with Selenium and PhantomJS
frisbeeCollect email addresses by crawling search engine results.
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
findmeaflatGet notified of new listings on popular German real estate portals.
nastyNASTY Advanced Search Tweet Yielder
crawlerNodejs crawler for cnbeta.com
lopezCrawling and scraping the Web for fun and profit
actor-youtube-scraperApify actor to scrape Youtube search results. You can set the maximum videos to scrape per page as well as the date from which to start scraping.
diskover-communityDiskover Community Edition - Open source file indexer, file search engine and data management and analytics powered by Elasticsearch
qr-piratecrawl QR-codes from search engines and look for bitcoin private keys
CrawlerSamplesThis is a Puppeteer+AngleSharp crawler console app samples, used C# 7.1 coding and dotnet core build.