ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+223.81%)
iowebWeb Scraping Framework
Stars: ✭ 31 (+47.62%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+14919.05%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+147.62%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+80.95%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+142.86%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (+4.76%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-19.05%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+8128.57%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+3657.14%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+1261.9%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+152.38%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-19.05%)
bots-zooNo description or website provided.
Stars: ✭ 59 (+180.95%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+485.71%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1219.05%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (+366.67%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (+190.48%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+2071.43%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+1371.43%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+1995.24%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+22933.33%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+376.19%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (+76.19%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+201533.33%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+714.29%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+73876.19%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (+128.57%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+842.86%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+1080.95%)
oversmashOverwatch API library for player details and career stats
Stars: ✭ 42 (+100%)
trafilaturaPython & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3285.71%)
scrapersscrapers for building your own image databases
Stars: ✭ 46 (+119.05%)
selectorlibA library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+152.38%)
KatastropheCommand Line Tool to download torrents
Stars: ✭ 85 (+304.76%)
tech-seo-crawlerBuild a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.
Stars: ✭ 57 (+171.43%)
mal-analysisgithub repo for MyAnimeList analysis. Also links to the MAL dataset.
Stars: ✭ 31 (+47.62%)
asyncio-hnPython (asyncio) wrapper for hackernews api
Stars: ✭ 27 (+28.57%)
etf4u📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation
Stars: ✭ 29 (+38.1%)
turtleInstagram Photo Downloader
Stars: ✭ 15 (-28.57%)
shorter.recipesA website dedicated to making recipes from any website easy to read.
Stars: ✭ 27 (+28.57%)
gochanges**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-42.86%)
coreThe complete web scraping toolkit for PHP.
Stars: ✭ 1,110 (+5185.71%)
info-bot🤖 A Versatile Telegram Bot
Stars: ✭ 37 (+76.19%)
ksoupKotlin Wrapper for Jsoup
Stars: ✭ 59 (+180.95%)
puppeteer-botcheck🕵♂ Bot detection tests for Puppeteer. Hide and seek!
Stars: ✭ 42 (+100%)
laravel-block-botsBlock crawlers and high traffic users on your site by IP using Redis
Stars: ✭ 25 (+19.05%)
GoiratePillaging the seven seas for torrents, pieces of eight and other bounty.
Stars: ✭ 20 (-4.76%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (+80.95%)