Top 80 crawling open source projects

pumba
Fetch, store and access user agent strings for different browsers
zcrawl
An open source web crawling platform
telegram-crawler
🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
auctus
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
the-seinfeld-chronicles
A dataset for textual analysis on arguably the best written comedy television show ever.
xXx dead xXx
b̶̡̪̬͒l̸̰̗̝̀ỏ̷̡̩g̴͇̑g̶̲̱̽͐i̵̹͗n̶̤̥͂̅̆g̴̮̾̅͜ ̷̧͎͆i̷̛͒͜͠n̸̥̺͒ ̶͚͚͊̿͜t̸̺͙̭̆̊̈́ḧ̶̟́̐e̸̱͔̟̓̓͝ ̶̨͔̾͛̑d̵̥̣̏ȧ̷̼̊r̷̰̝̥̅̌͝k̵̟̥̞̉̍͛
socials
👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
tech-seo-crawler
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
mal-analysis
github repo for MyAnimeList analysis. Also links to the MAL dataset.
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
core
The complete web scraping toolkit for PHP.
scrape-github-trending
Tutorial for web scraping / crawling with Node.js.
puppet-master
Puppeteer as a service hosted on Saasify.
BaiduSpider
项目已经移动至:https://github.com/BaiduSpider/BaiduSpider !! 一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。
61-80 of 80 crawling projects