All Projects → QuantumLiu → ComicSpider

QuantumLiu / ComicSpider

Licence: GPL-3.0 license
动漫之家漫画站电脑版原图爬虫

Programming Languages

python
139335 projects - #7 most used programming language
Batchfile
5799 projects

Projects that are alternatives of or similar to ComicSpider

Comic Dl
Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
Stars: ✭ 365 (+444.78%)
Mutual labels:  phantomjs, comics
Tspider
Yet Another Web Spider
Stars: ✭ 70 (+4.48%)
Mutual labels:  spider, phantomjs
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (+119.4%)
Mutual labels:  spider, phantomjs
bet365-websocket-crawler
bet365 bot: bet365的比赛实时比分数据、实时赔率
Stars: ✭ 67 (+0%)
Mutual labels:  spider
blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
Stars: ✭ 100 (+49.25%)
Mutual labels:  spider
feaplat
爬虫管理系统,支持集群,弹性伸缩。支持运行feapder、scrapy、selenium、playwright等各种框架及脚本
Stars: ✭ 42 (-37.31%)
Mutual labels:  spider
fetchurls
A bash script to spider a site, follow links, and fetch urls (with built-in filtering) into a generated text file.
Stars: ✭ 97 (+44.78%)
Mutual labels:  spider
phantomic
Pipe stdin to Phantom.JS
Stars: ✭ 20 (-70.15%)
Mutual labels:  phantomjs
bangumi yearly report
No description or website provided.
Stars: ✭ 24 (-64.18%)
Mutual labels:  spider
small-spider-project
日常爬虫
Stars: ✭ 14 (-79.1%)
Mutual labels:  spider
jazeee-meteor-spiderable
Fork of Meteor Spiderable with longer timeout, caching, better server handling
Stars: ✭ 33 (-50.75%)
Mutual labels:  phantomjs
spider
python 爬虫(amazon, confluence ...)
Stars: ✭ 21 (-68.66%)
Mutual labels:  spider
goSpider
some small project and some articles
Stars: ✭ 56 (-16.42%)
Mutual labels:  spider
crawlBaiduWenku
这可能是爬百度文库最全的项目了
Stars: ✭ 63 (-5.97%)
Mutual labels:  spider
PTT Beauty Spider
PTT 表特版爬蟲圖片下載器
Stars: ✭ 47 (-29.85%)
Mutual labels:  spider
Novel-crawler
这是一个用Python写的小说爬虫软件
Stars: ✭ 75 (+11.94%)
Mutual labels:  spider
MoMo
利用墨墨背单词的分享功能拿每日20个的单词上限奖励(多线程
Stars: ✭ 45 (-32.84%)
Mutual labels:  spider
crawlerdetect
Golang module to detect bots and crawlers via the user agent
Stars: ✭ 22 (-67.16%)
Mutual labels:  spider
scraper
图片爬取下载工具,极速爬取下载 站酷https://www.zcool.com.cn/, CNU 视觉 http://www.cnu.cc/ 设计师/用户 上传的 图片/照片/插画。
Stars: ✭ 64 (-4.48%)
Mutual labels:  spider
TikTokDownloader PyWebIO
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
Stars: ✭ 919 (+1271.64%)
Mutual labels:  spider

ComicSpider

The first open-source crawler of raw comics images on dmzj website.
第一个开源的动漫之家漫画站电脑版原图爬虫

尊重版权,只供爱好者研究使用,禁止商业用途,保留追究法律责任的权利

Requirements依赖项

python3,git  
requests,phantomJS,selenium  
optional:pyinstaller,PyQt5(for GUI)

Description描述

The first open-source crawler of raw comics images on dmzj website.Used PhantomJS,and selenuium to get the index of pages for each chapter of a comic. Download and save the all pages to local files.
The implementation of crawler's logic is in comic.py,you can develop your own crawler programs based on it.
We provide a console crawler download_f.py and a GUI crawler comic_gui.py.
And we also provide packaged win32/64 .exe programs.

第一个从动漫之家漫画站爬取电脑版原图的开源爬虫。使用PhantomJS,和 selenuium获取每个漫画章节的分页索引。爬取并下载漫画图片到本地文件。
爬虫逻辑实现在comic.py,功能完整,支持增量下载,可供开发者自行开发爬虫。
提供命令行download_f.py和图形界面comic_gui.py两种爬虫程序。
提供打包好的win32/64 .exe程序。

Usage使用

中文版本:

安装依赖 。
在命令行cmd或终端:
git clone https://github.com/QuantumLiu/ComicSpider.git
下载 PhantomJS, 解压并将phantomjs.exe文件放在.py文件的同一个文件夹。或者把phantomjs.exe所在路径添加到环境变量PATH
如果需要使用二进制文件 (Windows),请下载最新的 releases.

GUI版本:

使用源码:
python comic_gui.py
或双击comic_gui.exe
输入你想要爬取的漫画的地址
GUI运行
如果点击预览可预览漫画封面及相关信息,并自动生成一个保存目录。
预览1
预览2
可选择是否使用多线程。
输入或点击选择目录来制定保存目录。
点击爬取开始爬取漫画。
爬取

命令行版本:

请在 ComicSpider/ 文件夹创建一个文本文件,并写入你要下载的漫画的网址。
例如,将以下内容写入 url.txt:
http://manhua.dmzj.com/dcyuzhouchongsheng/
http://manhua.dmzj.com/sanweiyitiv2/
url

那么程序将下载这两部漫画:

cs three
在cmd/shell:
cd ComicSpider
python download_f.py url.txt 1
有两个可选参数:
第一个参数用来指定存放要下载的漫画地址的文本文件的路径,默认值为 './url.txt'. 最后一个参数用来指定是否使用多线程。'1' 即 'True' e其他的是 'False'.默认值 'False'. 运行结果: 运行 结果 如你所见,程序创建了一个漫画文件夹,里面是各个章节的文件夹,每个章节文件夹内存放每一页的jpg文件。
或者使用二进制文件:
双击 comicspider_console.exe 将以默认值运行。
或者在 cmd/shell/.bat:
comicspider_console <your file> <multi-threads flag>

English version:

Install all dependencies.
In cmd/shell:
git clone https://github.com/QuantumLiu/ComicSpider.git
Download PhantomJS, and copy it to the same floder of .py files.Or add the path of the phantomjs.exe file to PATH.
If you want to use the binary package, download the lastest releases.

GUI version:

From source codes:
python comic_gui.py
Or click comic_gui.exe
Type the urls of comics you want to download.
GUI运行
If you click 预览 you can preview the cover and infos of the comic, and fill a auto-choice directory to save comic images.
预览1
预览2
You can choose whether useing multi-threading.
Type a directory or click 选择目录 to configure the save directory.
Click 爬取 to start crawling.
爬取

Console version:

Please create a text file in ComicSpider/ and write the urls of comics you want to download.
For example,write following urls in url.txt:
http://manhua.dmzj.com/dcyuzhouchongsheng/
http://manhua.dmzj.com/sanweiyitiv2/
url

So the program will download those two comics:

cs three
Download PhantomJS, and copy it to the same floder of .py files.Or add the path of the .exe file to PATH.
Then in cmd/shell:
cd ComicSpider
python download_f.py url.txt 1 There are two arguments:
First is used to configure the url text file,the default value is './url.txt'. The last argument is weather using multi threads.'1' for 'True' else for 'False'.Deafult for 'False'. Results: 运行 结果 As you see, the program collected a comic in one floder and below the floder are chapter floders, in each chapter floder there are .jpg files of all pages of the chapter.
Or using packaged binary program:
Double click comicspider_console.exe to run with deafult arguments. Or in cmd/shell/.bat:
comicspider_console <your file> <multi-threads flag>

Packaging打包

Require pyinstaller.
run make.bat

Future

更多可指定参数
非阻塞显示图片
基于itchat的微信扩展

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].