A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (-26.92%)

Mutual labels: spider, scraping, crawling

fetchurls

A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.

Stars: ✭ 97 (+86.54%)

Mutual labels: spider, wget, crawl

Dataflowkit

Extract structured data from web sites. Web sites scraping.

Stars: ✭ 456 (+776.92%)

Mutual labels: scraper, scraping, crawling

Ferret

Declarative web scraping

Stars: ✭ 4,837 (+9201.92%)

Mutual labels: scraper, scraping, crawling

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Stars: ✭ 5,129 (+9763.46%)

Mutual labels: scraper, scraping, crawling

Grab Site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Stars: ✭ 680 (+1207.69%)

Mutual labels: spider, archiving, crawl

Zeiver

A Scraper, Downloader, & Recorder for static open directories.

Stars: ✭ 14 (-73.08%)

Mutual labels: scraper, downloader, scraping

proxycrawl-python

ProxyCrawl Python library for scraping and crawling

Stars: ✭ 51 (-1.92%)

Mutual labels: scraper, scraping, crawling

bots-zoo

No description or website provided.

Stars: ✭ 59 (+13.46%)

Mutual labels: scraper, scraping, crawling

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+1869.23%)

Mutual labels: scraper, spider, scraping

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+2296.15%)

Mutual labels: scraper, spider, scraping

Anime Dl

Anime-dl is a command-line program to download anime from CrunchyRoll and Funimation.

Stars: ✭ 190 (+265.38%)

Mutual labels: scraper, scraping

Jsonframe Cheerio

simple multi-level scraper json input/output for Cheerio

Stars: ✭ 196 (+276.92%)

Mutual labels: scraper, scraping

Querylist

🕷️ The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Stars: ✭ 2,392 (+4500%)

Mutual labels: scraper, spider

copycat

A PHP Scraping Class

Stars: ✭ 70 (+34.62%)

Mutual labels: scraper, scraping

Goose Parser

Universal scrapping tool, which allows you to extract data using multiple environments

Stars: ✭ 211 (+305.77%)

Mutual labels: scraper, scraping

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+359.62%)

Mutual labels: scraper, scraping

BaiduSpider

项目已经移动至：https://github.com/BaiduSpider/BaiduSpider ！！一个爬取百度搜索结果的爬虫，目前支持百度网页搜索，百度图片搜索，百度知道搜索，百度视频搜索，百度资讯搜索，百度文库搜索，百度经验搜索和百度百科搜索。

Stars: ✭ 29 (-44.23%)

Mutual labels: spider, crawling

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (+265.38%)

Mutual labels: scraper, spider

Serpscrap

SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.

Stars: ✭ 153 (+194.23%)

Mutual labels: scraper, scraping

Annie

👾 Fast and simple video download library and CLI tool written in Go

Stars: ✭ 16,369 (+31378.85%)

Mutual labels: scraper, downloader

Scrapysharp

reborn of https://bitbucket.org/rflechner/scrapysharp

Stars: ✭ 226 (+334.62%)

Mutual labels: scraper, scraping

google-scraper

This class can retrieve search results from Google.

Stars: ✭ 33 (-36.54%)

Mutual labels: scraper, scraping

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+184.62%)

Mutual labels: scraper, scraping

Website-downloader

💡 Download the complete source code of any website (including all assets). [ Javascripts, Stylesheets, Images ] using Node.js

Stars: ✭ 615 (+1082.69%)

Mutual labels: scraper, downloader

gospider

⚡ Light weight Golang spider framework | 轻量的 Golang 爬虫框架

Stars: ✭ 183 (+251.92%)

Mutual labels: spider, crawl

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.

Stars: ✭ 42 (-19.23%)

Mutual labels: scraping, crawling

Pahe.ph-Scraper

Pahe.ph [Pahe.in] Movies Website Scraper

Stars: ✭ 57 (+9.62%)

Mutual labels: scraper, scraping

gochanges

**[ARCHIVED]** website changes tracker 🔍

Stars: ✭ 12 (-76.92%)

Mutual labels: scraper, scraping

warcworker

A dockerized, queued high fidelity web archiver based on Squidwarc

Stars: ✭ 48 (-7.69%)

Mutual labels: archiving, webarchiving

gathertool

gathertool是golang脚本化开发库，目的是提高对应场景程序开发的效率；轻量级爬虫库，接口测试&压力测试库，DB操作库等。

Stars: ✭ 36 (-30.77%)

Mutual labels: spider, crawl

turtle

Instagram Photo Downloader

Stars: ✭ 15 (-71.15%)

Mutual labels: downloader, scraping

stweet

Advanced python library to scrap Twitter (tweets, users) from unofficial API

Stars: ✭ 287 (+451.92%)

Mutual labels: scraper, crawl

socials

👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.

Stars: ✭ 37 (-28.85%)

Mutual labels: scraping, crawling

Udemycoursegrabber

Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!

Stars: ✭ 137 (+163.46%)

Mutual labels: scraper, scraping

lezhin-comics-downloader

📥 Downloader for lezhin comics

Stars: ✭ 30 (-42.31%)

Mutual labels: scraper, downloader

double-agent

A test suite of common scraper detection techniques. See how detectable your scraper stack is.

Stars: ✭ 123 (+136.54%)

Mutual labels: scraping, crawling

scrapers

scrapers for building your own image databases