InstaBotSimple and friendly Bot for Instagram, using Selenium and Scrapy with Python.
Stars: ✭ 32 (+88.24%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (+29.41%)
ScrappleA framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+2629.41%)
SeleniumcrawlerAn example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (+588.24%)
Email ExtractorThe main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (+376.47%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+488.24%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+300%)
proxiProxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (+88.24%)
RARBG-scraperWith Selenium headless browsing and CAPTCHA solving
Stars: ✭ 38 (+123.53%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+623.53%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+123.53%)
policy-data-analyzerBuilding a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (+29.41%)
scrapy-zyte-smartproxyZyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
Stars: ✭ 317 (+1764.71%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+1717.65%)
Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+5317.65%)
scrapy-fieldstatsA Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (+0%)
torchestratorSpin up Tor containers and then proxy HTTP requests via these Tor instances
Stars: ✭ 32 (+88.24%)
image-collectorDownload images from Google Image Search
Stars: ✭ 38 (+123.53%)
pythonSpider🕷️some python spiders with BeautifulSoup or scarpy
Stars: ✭ 28 (+64.71%)
naos📉 Uptime and error monitoring CLI
Stars: ✭ 30 (+76.47%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+2688.24%)
ZeiverA Scraper, Downloader, & Recorder for static open directories.
Stars: ✭ 14 (-17.65%)
AngleParseHTML parsing and processing tool for PowerShell.
Stars: ✭ 35 (+105.88%)
restaurant-finder-featureReviewsBuild a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (+23.53%)
chirpsTwitter bot powering @arichduvet
Stars: ✭ 35 (+105.88%)
Scraper-Projects🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (+47.06%)
web-clipperEasily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-11.76%)
humanparserParse a human name string into salutation, first name, middle name, last name, suffix.
Stars: ✭ 78 (+358.82%)
top-github-scraperScape top GitHub repositories and users based on keywords
Stars: ✭ 40 (+135.29%)
dustArchive web pages with all relevant assets or save as a single file HTML
Stars: ✭ 19 (+11.76%)
subscene scraperLibrary to download subtitles from subscene.com
Stars: ✭ 14 (-17.65%)
XMQ-BackUp小密圈备份,圈子/话题/图片/文件。
Stars: ✭ 22 (+29.41%)
BOC FER SpiderUse Scrapy crawl foreign exchange rate from BOC (Bank of China)
Stars: ✭ 18 (+5.88%)
JustDownlink基于Scrapy+Elasticsearch+Django搭建的分布式电影搜索
Stars: ✭ 28 (+64.71%)
PyLexPerform lexical analysis on words, one word at a time.
Stars: ✭ 60 (+252.94%)
TorScrapperA Scraper made 100% in Python using BeautifulSoup and Tor. It can be used to scrape both normal and onion links. Happy Scraping :)
Stars: ✭ 24 (+41.18%)
pompScreen scraping and web crawling framework
Stars: ✭ 61 (+258.82%)
Captcha-ToolsAll-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha and Anticaptcha API's!
Stars: ✭ 23 (+35.29%)
GPlayCrawlerNo description or website provided.
Stars: ✭ 47 (+176.47%)
ferendaTransform unstructured document collections to structured Linked Data
Stars: ✭ 22 (+29.41%)
scrapy-adminA django admin site for scrapy
Stars: ✭ 44 (+158.82%)
python-spiderpython爬虫小项目【持续更新】【笔趣阁小说下载、Tweet数据抓取、天气查询、网易云音乐逆向、天天基金网查询、微博数据抓取(生成cookie)、有道翻译逆向、企查查免登陆爬虫、大众点评svg加密破解、B站用户爬虫、拉钩免登录爬虫、自如租房字体加密、知乎问答
Stars: ✭ 45 (+164.71%)
internet-affordability🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-23.53%)