CrawlabDistributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Stars: ✭ 8,392 (+38045.45%)
crawlkitA crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.
Stars: ✭ 23 (+4.55%)
XidelCommand line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Stars: ✭ 335 (+1422.73%)
OLX Scraper📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-31.82%)
PoliteBe nice on the web
Stars: ✭ 253 (+1050%)
SpidyThe simple, easy to use command line web crawler.
Stars: ✭ 257 (+1068.18%)
bing-ip2hostsbingip2hosts is a Bing.com web scraper that discovers websites by IP address
Stars: ✭ 99 (+350%)
bots-zooNo description or website provided.
Stars: ✭ 59 (+168.18%)
HuginnCreate agents that monitor and act on your behalf. Your agents are standing by!
Stars: ✭ 33,694 (+153054.55%)
Linkedin Profile Scraper🕵️♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+677.27%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+136.36%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (+140.91%)
SinglefilezWeb Extension for Firefox/Chrome/MS Edge and CLI tool to save a faithful copy of an entire web page in a self-extracting HTML/ZIP polyglot file
Stars: ✭ 882 (+3909.09%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (+118.18%)
ARGUSARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+209.09%)
newsembleAPI for fetching data from news websites.
Stars: ✭ 42 (+90.91%)
metacritic apiPHP Metacritic API - Mirrored by my GitLab
Stars: ✭ 31 (+40.91%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+18431.82%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+10250%)
SpidrA versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+2881.82%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+3486.36%)
Youtube ProjectsThis repository contains all the code I use in my YouTube tutorials.
Stars: ✭ 144 (+554.55%)
antA web crawler for Go
Stars: ✭ 264 (+1100%)
Spam Bot 3000Social media research and promotion, semi-autonomous CLI bot
Stars: ✭ 79 (+259.09%)
gotorThis program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
Stars: ✭ 97 (+340.91%)
web-crawlerPython Web Crawler with Selenium and PhantomJS
Stars: ✭ 19 (-13.64%)
TrackPurchase단 몇줄의 코드로 다양한 쇼핑 플랫폼에서 결제 내역을 긁어오자!
Stars: ✭ 19 (-13.64%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+21886.36%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+354.55%)
Skycaiji蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+6781.82%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+1159.09%)
CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+70513.64%)
newspaperjsNews extraction and scraping. Article Parsing
Stars: ✭ 59 (+168.18%)
img-cliAn interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-31.82%)
evineInteractive CLI Web Crawler
Stars: ✭ 140 (+536.36%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+800%)
RcrawlerAn R web crawler and scraper
Stars: ✭ 274 (+1145.45%)
Instagram-Scraper-2021Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).
Stars: ✭ 57 (+159.09%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+1900%)
GoscraperGolang pkg to quickly return a preview of a webpage (title/description/images)
Stars: ✭ 72 (+227.27%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+2795.45%)
Awesome CrawlerA collection of awesome web crawler,spider in different languages
Stars: ✭ 4,793 (+21686.36%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+52377.27%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+1972.73%)
Linkedin scraperA library that scrapes Linkedin for user data
Stars: ✭ 413 (+1777.27%)
robotstxtrobots.txt file parsing and checking for R
Stars: ✭ 65 (+195.45%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (+131.82%)
supervised-machine-learningThis repo contains regression and classification projects. Examples: development of predictive models for comments on social media websites; building classifiers to predict outcomes in sports competitions; churn analysis; prediction of clicks on online ads; analysis of the opioids crisis and an analysis of retail store expansion strategies using…
Stars: ✭ 34 (+54.55%)
freeRepBypass repubblica.it and lastampa.it paywall
Stars: ✭ 34 (+54.55%)
impartus-downloaderDownload Impartus lectures, convert to mkv for offline viewing.
Stars: ✭ 19 (-13.64%)
web-scraping-engineA simple web scraping engine supporting concurrent and anonymous scraping
Stars: ✭ 27 (+22.73%)
irProjeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir
Stars: ✭ 120 (+445.45%)
stock-market-scraperScraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)
Stars: ✭ 110 (+400%)
aliexscrapeGet Aliexpress product details in JSON
Stars: ✭ 80 (+263.64%)
VideoRecognition-realtime-autotrainer-alertsState of the art object detection in real-time using YOLOV3 algorithm. Augmented with a process that allows easy training of the classifier as a plug & play solution . Provides alert if an item in an alert list is detected.
Stars: ✭ 36 (+63.64%)