zcrawlAn open source web crawling platform
Stars: ✭ 21 (-4.55%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+3486.36%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-22.73%)
custom-crawler🌌 High productivity semi-automatic crawler generator 🛠️🧰
Stars: ✭ 33 (+50%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+136.36%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+14236.36%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+1972.73%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-22.73%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+800%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+209.09%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+140.91%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+354.55%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+1304.55%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+1200%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+677.27%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (+345.45%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (+177.27%)
bots-zooNo description or website provided.
Stars: ✭ 59 (+168.18%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+1900%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1159.09%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+192368.18%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+7754.55%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+72.73%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+1027.27%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+459.09%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+21886.36%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+131.82%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+70513.64%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (+68.18%)
scrapersscrapers for building your own image databases
Stars: ✭ 46 (+109.09%)
shorter.recipesA website dedicated to making recipes from any website easy to read.
Stars: ✭ 27 (+22.73%)
tech-seo-crawlerBuild a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Stars: ✭ 57 (+159.09%)
oversmashOverwatch API library for player details and career stats
Stars: ✭ 42 (+90.91%)
mal-analysisgithub repo for MyAnimeList analysis. Also links to the MAL dataset.
Stars: ✭ 31 (+40.91%)
etf4u📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation
Stars: ✭ 29 (+31.82%)
turtleInstagram Photo Downloader
Stars: ✭ 15 (-31.82%)
covid19br-pubProjeto de monitoramento de publicações oficiais relacionadas a COVID-19 no Brasil.
Stars: ✭ 12 (-45.45%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (+72.73%)
gochanges**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-45.45%)
pystormBattle-tested Apache Storm Multi-Lang implementation for Python
Stars: ✭ 68 (+209.09%)
4catThe 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Stars: ✭ 144 (+554.55%)
vaadin-dialogHigh quality web component for modal dialogs. Part of the Vaadin platform.
Stars: ✭ 15 (-31.82%)
auctusDataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Stars: ✭ 34 (+54.55%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+4945.45%)
info-bot🤖 A Versatile Telegram Bot
Stars: ✭ 37 (+68.18%)
dimmed👔 Dimmed Color Theme for Sublime Text 2/3
Stars: ✭ 18 (-18.18%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (+218.18%)
vaadin-boardWeb Component for creating flexible responsive layouts and building nice looking dashboards.
Stars: ✭ 17 (-22.73%)
GoiratePillaging the seven seas for torrents, pieces of eight and other bounty.
Stars: ✭ 20 (-9.09%)