ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+300%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+123.53%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+488.24%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+623.53%)
Pdf downloaderA Scrapy Spider for downloading PDF files from a webpage.
Stars: ✭ 18 (+5.88%)
InstaBotSimple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (+88.24%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (+88.24%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (+117.65%)
scrapy-zyte-smartproxyZyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Stars: ✭ 317 (+1764.71%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+18452.94%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+211.76%)
Scrapy SeleniumScrapy middleware to handle javascript pages using selenium
Stars: ✭ 550 (+3135.29%)
scrapy-wayback-machineA Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (+441.18%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (+29.41%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (+29.41%)
bots-zooNo description or website provided.
Stars: ✭ 59 (+247.06%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+205.88%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+2488.24%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+2629.41%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+4541.18%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+28352.94%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+5317.65%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (+123.53%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (+476.47%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+3647.06%)
SeleniumcrawlerAn example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (+588.24%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+200%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (+23.53%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+248976.47%)
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (+88.24%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+10064.71%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+905.88%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (+258.82%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (+0%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (+29.41%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (+0%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+1717.65%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+1717.65%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+2582.35%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+1582.35%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+91282.35%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1529.41%)
Email ExtractorThe main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (+376.47%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+1064.71%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+1358.82%)
vietnam-ecommerce-crawlerCrawling the data from lazada, websosanh, compare.vn, cdiscount and cungmua with flexible configs
Stars: ✭ 28 (+64.71%)
itemadapterCommon interface for data container classes
Stars: ✭ 47 (+176.47%)
gochanges**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-29.41%)
crawlerpython爬虫项目集合
Stars: ✭ 29 (+70.59%)
auctusDataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Stars: ✭ 34 (+100%)
xXx dead xXxb̶̡̪̬͒l̸̰̗̝̀ỏ̷̡̩g̴͇̑g̶̲̱̽͐i̵̹͗n̶̤̥͂̅̆g̴̮̾̅͜ ̷̧͎͆i̷̛͒͜͠n̸̥̺͒ ̶͚͚͊̿͜t̸̺͙̭̆̊̈́ḧ̶̟́̐e̸̱͔̟̓̓͝ ̶̨͔̾͛̑d̵̥̣̏ȧ̷̼̊r̷̰̝̥̅̌͝k̵̟̥̞̉̍͛
Stars: ✭ 19 (+11.76%)
asyncpy使用asyncio和aiohttp开发的轻量级异步协程web爬虫框架
Stars: ✭ 86 (+405.88%)