pumbaFetch, store and access user agent strings for different browsers
zcrawlAn open source web crawling platform
telegram-crawler🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
auctusDataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
xXx dead xXxb̶̡̪̬͒l̸̰̗̝̀ỏ̷̡̩g̴͇̑g̶̲̱̽͐i̵̹͗n̶̤̥͂̅̆g̴̮̾̅͜ ̷̧͎͆i̷̛͒͜͠n̸̥̺͒ ̶͚͚͊̿͜t̸̺͙̭̆̊̈́ḧ̶̟́̐e̸̱͔̟̓̓͝ ̶̨͔̾͛̑d̵̥̣̏ȧ̷̼̊r̷̰̝̥̅̌͝k̵̟̥̞̉̍͛
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
tech-seo-crawlerBuild a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
mal-analysisgithub repo for MyAnimeList analysis. Also links to the MAL dataset.
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
coreThe complete web scraping toolkit for PHP.
BaiduSpider项目已经移动至:https://github.com/BaiduSpider/BaiduSpider !! 一个爬取百度搜索结果的爬虫,目前支持百度网页搜索,百度图片搜索,百度知道搜索,百度视频搜索,百度资讯搜索,百度文库搜索,百度经验搜索和百度百科搜索。