CollyElegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+8984.8%)
CrawlyCrawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+157.31%)
bots-zooNo description or website provided.
Stars: ✭ 59 (-65.5%)
wget-luaWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-69.59%)
Webstera reliable high-level web crawling & scraping framework for Node.js.
Stars: ✭ 364 (+112.87%)
GeziyorGeziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+628.65%)
Gopa[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Stars: ✭ 277 (+61.99%)
FerretDeclarative web scraping
Stars: ✭ 4,837 (+2728.65%)
Lulu[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+361.4%)
DotnetcrawlerDotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (-41.52%)
Jsonframe Cheeriosimple multi-level scraper json input/output for Cheerio
Stars: ✭ 196 (+14.62%)
Scrape Linkedin Selenium`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+39.77%)
double-agentA test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (-28.07%)
diffbot-php-client[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-69.01%)
papercutPapercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-91.23%)
arachnodHigh performance crawler for Nodejs
Stars: ✭ 17 (-90.06%)
Sasila一个灵活、友好的爬虫框架
Stars: ✭ 286 (+67.25%)
Querylist🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。
Stars: ✭ 2,392 (+1298.83%)
proxycrawl-pythonProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-70.18%)
Awesome PuppeteerA curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+910.53%)
SquidwarcSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
Stars: ✭ 125 (-26.9%)
Freshonions TorscraperFresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
Stars: ✭ 348 (+103.51%)
Ppspiderweb spider built by puppeteer, support task-queue and task-scheduling by decorators,support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架,提供灵活的任务队列管理调度方案,提供便捷的数据保存方案(nedb/mongodb),提供数据可视化和用户交互的实现方案
Stars: ✭ 237 (+38.6%)
Goose ParserUniversal scrapping tool, which allows you to extract data using multiple environments
Stars: ✭ 211 (+23.39%)
Skycaiji蓝天采集器是一款免费的数据采集发布爬虫软件,采用php+mysql开发,可部署在云服务器,几乎能采集所有类型的网页,无缝对接各类CMS建站程序,免登录实时发布数据,全自动无需人工干预!是网页大数据采集软件中完全跨平台的云端爬虫系统
Stars: ✭ 1,514 (+785.38%)
socials👨👩👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (-78.36%)
ScrapyScrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+24661.99%)
GosintOSINT Swiss Army Knife
Stars: ✭ 401 (+134.5%)
NewspaperNews, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+6651.46%)
scrapy facebookerCollection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-87.13%)
scrapy-distributedA series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (-77.78%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-71.93%)
Awesome CrawlerA collection of awesome web crawler,spider in different languages
Stars: ✭ 4,793 (+2702.92%)
Apify JsApify SDK — The scalable web scraping and crawling library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.
Stars: ✭ 3,154 (+1744.44%)
ScrapedinLinkedIn Scraper (currently working 2020)
Stars: ✭ 453 (+164.91%)
FbcrawlA Facebook crawler
Stars: ✭ 536 (+213.45%)
Xcrawler快速、简洁且强大的PHP爬虫框架
Stars: ✭ 344 (+101.17%)
CrawlerA high performance web crawler in Elixir.
Stars: ✭ 781 (+356.73%)
AutoscraperA Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+2284.21%)
ScrapitScraping scripts for various websites.
Stars: ✭ 25 (-85.38%)
DataflowkitExtract structured data from web sites. Web sites scraping.
Stars: ✭ 456 (+166.67%)
LinkedinLinkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+80.7%)
SpidrA versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+283.63%)
ArachnidPowerful web scraping framework for Crystal
Stars: ✭ 68 (-60.23%)
AvbookAV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Stars: ✭ 8,133 (+4656.14%)
ScrapyrtHTTP API for Scrapy spiders
Stars: ✭ 637 (+272.51%)
JvppeteerHeadless Chrome For Java (Java 爬虫)
Stars: ✭ 193 (+12.87%)
AntchAntch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+15.79%)
ToapiEvery web site provides APIs.
Stars: ✭ 3,209 (+1776.61%)
NewcrawlerFree Web Scraping Tool with Java
Stars: ✭ 589 (+244.44%)
Awesome Python Primer自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-66.67%)