ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (-32%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+340%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+186%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+364%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+98%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+3977%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+42243%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+537%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+689%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+4737%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-62%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+71%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+177%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-41%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+23%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+15435%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-83%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-78%)
RcrawlerAn R web crawler and scraper
Stars: ✭ 274 (+174%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-83%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-52%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (+157%)
DotnetspiderDotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Stars: ✭ 3,233 (+3133%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+3054%)
Vaultswiss army knife for hackers
Stars: ✭ 346 (+246%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+264%)
Module Shop一个基于 .NET Core构建的简单、跨平台、模块化的商城系统
Stars: ✭ 398 (+298%)
SimplcommerceA simple, cross platform, modularized ecommerce system built on .NET Core
Stars: ✭ 3,474 (+3374%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+209%)
ScrapoxyScrapoxy hides your scraper behind a cloud. It starts a pool of proxies to send your requests. Now, you can crawl without thinking about blacklisting!
Stars: ✭ 1,322 (+1222%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+356%)
img-cliAn interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-85%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+209%)
GeziyorGeziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+1146%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (-2%)
Scrapy RedisRedis-based components for Scrapy.
Stars: ✭ 4,998 (+4898%)
FbcrawlA Facebook crawler
Stars: ✭ 536 (+436%)
Gazpacho🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+425%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+450%)
NewcrawlerFree Web Scraping Tool with Java
Stars: ✭ 589 (+489%)
IcrawlerA multi-thread crawler framework with many builtin image crawlers provided.
Stars: ✭ 629 (+529%)
Haipproxy💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+4893%)
Email ExtractorThe main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (-19%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+821%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-63%)
CrawlabDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+8292%)