Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → vicety → Pixiv Crawler

vicety / Pixiv Crawler

Licence: gpl-3.0

Scrapy框架下的pixiv多功能爬虫

Programming Languages

python

139335 projects - #7 most used programming language

Labels

scrapy pixiv

Projects that are alternatives of or similar to Pixiv Crawler

Seeker

Seeker - another job board aggregator.

Stars: ✭ 16 (-65.22%)

Mutual labels: scrapy

Scrapy Azuresearch Crawler Samples

Scrapy as a Web Crawler for Azure Search Samples

Stars: ✭ 20 (-56.52%)

Mutual labels: scrapy

Pixivformuzeiplus

Muzei Live Wallpaper's Source for Pixiv

Stars: ✭ 38 (-17.39%)

Mutual labels: pixiv

Pixiv Shaft

Pixiv第三方Android客户端

Stars: ✭ 887 (+1828.26%)

Mutual labels: pixiv

Voyages Sncf Api

A scrapy spider that scraps times and prices from Voyages Sncf. It uses scrapyrt to provide an API interface.

Stars: ✭ 7 (-84.78%)

Mutual labels: scrapy

Place2live

Analysis of the characteristics of different countries

Stars: ✭ 30 (-34.78%)

Mutual labels: scrapy

Py3 scripts

Life is short, *****.

Stars: ✭ 5 (-89.13%)

Mutual labels: scrapy

Django Dynamic Scraper

Creating Scrapy scrapers via the Django admin interface

Stars: ✭ 1,024 (+2126.09%)

Mutual labels: scrapy

Pixivwallpaper

Get anime style wallpapers from daily Pixiv high ranking illustrations

Stars: ✭ 20 (-56.52%)

Mutual labels: pixiv

Pixivcrawleriii

A python3 crawler for crawling Pixiv ranking top and any illustrator all artworks

Stars: ✭ 38 (-17.39%)

Mutual labels: pixiv

Pdf downloader

A Scrapy Spider for downloading PDF files from a webpage.

Stars: ✭ 18 (-60.87%)

Mutual labels: scrapy

Scrapy Cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

Stars: ✭ 921 (+1902.17%)

Mutual labels: scrapy

Scrapymon

Simple Web UI for Scrapy spider management via Scrapyd

Stars: ✭ 35 (-23.91%)

Mutual labels: scrapy

Scrapy Finance

[OUTDATED] scrapy spiders to crawl the financial text data 📚 📜 pertinent to train word vectors 🚀

Stars: ✭ 17 (-63.04%)

Mutual labels: scrapy

Articlespider

慕课网python分布式爬虫源码-长期更新维护

Stars: ✭ 40 (-13.04%)

Mutual labels: scrapy

Cq Picsearcher Bot

🤖 基于 saucenao / ascii2d / whatanime 的搜图机器人

Stars: ✭ 830 (+1704.35%)

Mutual labels: pixiv

Jspider

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

Stars: ✭ 914 (+1886.96%)

Mutual labels: scrapy

Pixeval

A Strong, Fast and Flexible Pixiv Client based on .NET Core and WPF

Stars: ✭ 1,031 (+2141.3%)

Mutual labels: pixiv

Crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

Stars: ✭ 8,392 (+18143.48%)

Mutual labels: scrapy

App comments spider

爬取百度贴吧、TapTap、appstore、微博官方博主上的游戏评论(基于redis_scrapy)，过滤器采用了bloomfilter。

Stars: ✭ 38 (-17.39%)

Mutual labels: scrapy

View All Similar Projects ➔

Pixiv-Crawler

这是一个scrapy框架的爬虫基于win10、Python 3.6.2 64位、Scrapy 1.4.0开发
在Ubuntu 16.04、Python 3.5.2 64位
Archlinux、Python 3.6.2 64位
win10、Python 3.6.2、Python 3.5.2下测试成功

注意

2018/11/21 update: P站又改网页了，由于个人原因，短期内不会更新代码，目前已知按作者爬取已不可用，但按收藏爬取和按搜索关键词爬取仍然是可用的。

功能

我的收藏导出
画师作品导出
搜索图片导出
日榜导出
所有导出均支持图片大小筛选
指定导出位置

未完成部分

增加一些其他的插画网站
一些细节
多图片网页暂不能命名文件

requirements

python
scrapy
requests
pillow
pypiwin32 // 可能需要
imageio //下载gif时需要
如果还缺少什么，一般直接pip install就可以了

使用方法

先在settings.ini进行配置，然后在main.py文件目录下进入cmd, 输入python main.py

Setting文件配置说明

[PRJ]  
/* 
四种执行方式之一
COLLECTION  收藏
COLLECTION_PRIVATE 非公开收藏
ARTIST 画师作品
SEARCH 搜索内容
DAILY 日榜
*/
TARGET = COLLECTION  
ACCOUNT = 
PASSWORD = 

[IMG] 
MIN_WIDTH = 0	//图片筛选条件
MIN_HEIGHT = 0
MIN_FAV = 0		
STORE_PATH = ./images		// 图片储存目录，默认为工程目录下的image
R18 = False		//仅下载R18
MULI_IMG_ENABLED = False	// 是否下载图集

[ART]	// 不受IMG中的收藏数限制
ID = 123456 // 画师ID，多个以空格分隔

[SRH]
TAGS = TAG_A TAG_B ... // 搜索内容

[DAILY] // 不受IMG中的收藏数限制

其他

如果在浏览器无法登陆pixiv或爬取时速度较慢，可以尝试修改host文件
由于P站限制，搜索功能最多搜索1000页，可以通过添加类似“1000users入り”（不含引号）这样的tag来缩小搜索范围
请确保用户语言为简体中文
如果提示setting文件编码问题，请尝试在编辑settings.ini文件时使用utf-8编码

版本日志

V1.2.3

增加对COLLECTION中爬取内容的追踪，过滤曾经爬过的图片，以支持个人收藏的快速更新
对文件存储结构和打印日志部分的优化
支持爬取非公开收藏

V1.2.2

应对Pixiv的页面改动，修改了部分数据的获取接口
由于找不到接口，不再支持Gif文件（如果找到了，还请通知一下）

V1.2.1

指定目录不存在时自动创建
增加图集的下载和Title抓取
同时抓取图片相关信息，以json格式存储
存储cookie以自动登录

V1.2.0

增加了日榜导出功能
增加了settings文件格式检查

V1.1

可以同时添加多个画师
修复搜索时日语编码问题
修改了setting文件结构，可以配置默认账号密码
修复了打印日志上的一些问题

V1.0

初始版本

最后，初次写爬虫，写得不是很好，有任何问题欢迎指教

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 46

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗