DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+47.06%)
zcrawlAn open source web crawling platform
Stars: ✭ 21 (-69.12%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+80.88%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-75%)
iowebWeb Scraping Framework
Stars: ✭ 31 (-54.41%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-44.12%)
Email ExtractorThe main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (+19.12%)
memes-apiAPI for scrapping common meme sites
Stars: ✭ 17 (-75%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+7013.24%)
Gazpacho🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+672.06%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-45.59%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+1060.29%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+2441.18%)
SeleniumcrawlerAn example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (+72.06%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+191.18%)
GrawlerGrawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (+44.12%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+22745.59%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (-45.59%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-13.24%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-67.65%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-22.06%)
crawling-frameworkEasily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-67.65%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+582.35%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+570.59%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+547.06%)
InstaBotSimple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (-52.94%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+1254.41%)
browser-automation-apiBrowser automation API for repetitive web-based tasks, with a friendly user interface. You can use it to scrape content or do many other things like capture a screenshot, generate pdf, extract content or execute custom Puppeteer, Playwright functions.
Stars: ✭ 24 (-64.71%)
anime-scraper[partially working] Scrape and add anime episode stream URLs to uGet (Linux) or IDM (Windows) ~ Python3
Stars: ✭ 21 (-69.12%)
hk0weatherWeb scraper project to collect the useful Hong Kong weather data from HKO website
Stars: ✭ 49 (-27.94%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-67.65%)
scrapy-zyte-smartproxyZyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Stars: ✭ 317 (+366.18%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+62169.12%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+151.47%)
MemoriousDistributed crawling framework for documents and structured data.
Stars: ✭ 248 (+264.71%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (-10.29%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+5895.59%)
scrapy-adminA django admin site for scrapy
Stars: ✭ 44 (-35.29%)
chesfCHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-73.53%)
logparserA tool for parsing Scrapy log files periodically and incrementally, extending the HTTP JSON API of Scrapyd.
Stars: ✭ 70 (+2.94%)
allitebooks.comDownload all the ebooks with indexed csv of "allitebooks.com"
Stars: ✭ 24 (-64.71%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-25%)
gotorThis program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
Stars: ✭ 97 (+42.65%)
go-scrapyWeb crawling and scraping framework for Golang
Stars: ✭ 17 (-75%)
newspaperjsNews extraction and scraping. Article Parsing
Stars: ✭ 59 (-13.24%)
Mimo-CrawlerA web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
Stars: ✭ 22 (-67.65%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-23.53%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+354.41%)
SpidermonScrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (+354.41%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (-44.12%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (-52.94%)
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (-52.94%)