A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (+100%)

Mutual labels: spider, scrapy

photo-spider-scrapy

10 photo website spiders, 10 个国外图库的 scrapy 爬虫代码

Stars: ✭ 17 (-10.53%)

Mutual labels: spider, scrapy

Funpyspidersearchengine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

Stars: ✭ 782 (+4015.79%)

Mutual labels: spider, scrapy

newsemble

API for fetching data from news websites.

Stars: ✭ 42 (+121.05%)

Mutual labels: scraper, webscraping

hk0weather

Web scraper project to collect the useful Hong Kong weather data from HKO website

Stars: ✭ 49 (+157.89%)

Mutual labels: scrapy, webscraping

python-spider

python爬虫小项目【持续更新】【笔趣阁小说下载、Tweet数据抓取、天气查询、网易云音乐逆向、天天基金网查询、微博数据抓取（生成cookie）、有道翻译逆向、企查查免登陆爬虫、大众点评svg加密破解、B站用户爬虫、拉钩免登录爬虫、自如租房字体加密、知乎问答

Stars: ✭ 45 (+136.84%)

Mutual labels: spider, scrapy

ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

Stars: ✭ 68 (+257.89%)

Mutual labels: scrapy, webscraping

PttImageSpider

PTT 圖片下載器 (抓取整個看板的圖片，並用文章標題作為資料夾的名稱 ) (使用Scrapy)

Stars: ✭ 16 (-15.79%)

Mutual labels: spider, scrapy

ip proxy pool

Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.

Stars: ✭ 39 (+105.26%)

Mutual labels: spider, scrapy

Rcrawler

An R web crawler and scraper

Stars: ✭ 274 (+1342.11%)

Mutual labels: scraper, webscraping

Douban Crawler

Uno Crawler por https://douban.com

Stars: ✭ 13 (-31.58%)

Mutual labels: spider, scrapy

Java Spider

一个基于webmagic框架二次开发的java爬虫框架实战，已实现能爬取腾讯，搜狐，今日头条（单独集成功能）等资讯内容，配合elasticsearch框架用法，实现了自动爬虫，已投入线上生产使用。

Stars: ✭ 276 (+1352.63%)

Mutual labels: spider, scraper

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Stars: ✭ 309 (+1526.32%)

Mutual labels: scraper, scrapy

Spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+3352.63%)

Mutual labels: spider, scraper

163Music

163music spider by scrapy.

Stars: ✭ 60 (+215.79%)

Mutual labels: spider, scrapy

Scrapy IPProxyPool

免费 IP 代理池。Scrapy 爬虫框架插件

Stars: ✭ 100 (+426.32%)

Mutual labels: spider, scrapy

devsearch

A web search engine built with Python which uses TF-IDF and PageRank to sort search results.

Stars: ✭ 52 (+173.68%)

Mutual labels: spider, scrapy

aliexscrape

Get Aliexpress product details in JSON

Stars: ✭ 80 (+321.05%)

Mutual labels: scraper, spider

Scrapyrt

HTTP API for Scrapy spiders

Stars: ✭ 637 (+3252.63%)

Mutual labels: scraper, scrapy

NScrapy

NScrapy is a .net core corss platform Distributed Spider Framework which provide an easy way to write your own Spider

Stars: ✭ 88 (+363.16%)

Mutual labels: spider, scrapy

Icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

Stars: ✭ 629 (+3210.53%)

Mutual labels: spider, scrapy

python-fxxk-spider

收集各种免费的 Python 爬虫项目

Stars: ✭ 184 (+868.42%)

Mutual labels: spider, scrapy

Scrapy-Spiders

一个基于Scrapy的数据采集爬虫代码库

Stars: ✭ 34 (+78.95%)

Mutual labels: spider, scrapy

newspaperjs

News extraction and scraping. Article Parsing

Stars: ✭ 59 (+210.53%)

Mutual labels: scraper, webscraping

Python Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

Stars: ✭ 615 (+3136.84%)

Mutual labels: spider, scrapy

V2EX Spider

V2EX爬虫

Stars: ✭ 21 (+10.53%)

Mutual labels: spider, scrapy

Freshonions Torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

Stars: ✭ 348 (+1731.58%)

Mutual labels: spider, scraper

toutiao

今日头条科技新闻接口爬虫

Stars: ✭ 17 (-10.53%)

Mutual labels: spider, scrapy

douban-spider

基于Scrapy框架的豆瓣电影爬虫

Stars: ✭ 25 (+31.58%)

Mutual labels: spider, scrapy

Instagram-Scraper-2021

Scrape Instagram content and stories anonymously, using a new technique based on the har file (No Token + No public API).

Stars: ✭ 57 (+200%)

Mutual labels: scraper, webscraping

arachnod

High performance crawler for Nodejs

Stars: ✭ 17 (-10.53%)

Mutual labels: scraper, spider

Happy Spiders

🔧 🔩 🔨 收集整理了爬虫相关的工具、模拟登陆技术、代理IP、scrapy模板代码等内容。

Stars: ✭ 261 (+1273.68%)

Mutual labels: spider, scrapy

Tieba spider

百度贴吧爬虫(基于scrapy和mysql)

Stars: ✭ 257 (+1252.63%)

Mutual labels: spider, scrapy

Alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.

Stars: ✭ 277 (+1357.89%)

Mutual labels: spider, scrapy

allitebooks.com

Download all the ebooks with indexed csv of "allitebooks.com"

Stars: ✭ 24 (+26.32%)

Mutual labels: scrapy, webscraping

Xcrawler

快速、简洁且强大的PHP爬虫框架

Stars: ✭ 344 (+1710.53%)

Mutual labels: spider, scraper

Xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Stars: ✭ 335 (+1663.16%)

Mutual labels: scraper, webscraping

Autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Stars: ✭ 4,077 (+21357.89%)

Mutual labels: scraper, webscraping

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (+2010.53%)

Mutual labels: spider, scraper

Advanced Web Scraping Tutorial

The Zipru scraper developed in the Advanced Web Scraping Tutorial.