All Projects → OhYee → documentDownloader

OhYee / documentDownloader

Licence: MIT license
download document from book118 for free

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to documentDownloader

iDocs
iDocs is one page documentation html template which helps you to create your offline and online documentation for your themes, templates, plugins and software.
Stars: ✭ 75 (+4.17%)
Mutual labels:  free, document
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (+426.39%)
Mutual labels:  downloader, spider
Spiders
Python爬虫,返回一定格式的信息,下载,使用flask提供简易api。抖音无水印、皮皮虾、快手、网易云音乐、qq音乐、咪咕音乐、荔枝FM音频、知乎视频、最右语音、视频、微博......
Stars: ✭ 372 (+416.67%)
Mutual labels:  downloader, spider
Gophie
An Aggregator Engine for searching and downloading movies free - NO ADs!
Stars: ✭ 94 (+30.56%)
Mutual labels:  downloader, free
yutto
🧊 一个可爱且任性的 B 站视频下载器(bilili V2)
Stars: ✭ 383 (+431.94%)
Mutual labels:  downloader, spider
Bilibili manga download
带图形界面的哔哩哔哩漫画下载工具
Stars: ✭ 52 (-27.78%)
Mutual labels:  downloader, spider
Fbiwarning
Node.js seed downloader (Node.js 种子神器)
Stars: ✭ 44 (-38.89%)
Mutual labels:  downloader, spider
fa5pro-downloader
A tool that allows you to download Font Awesome 5 Pro for free
Stars: ✭ 34 (-52.78%)
Mutual labels:  downloader, free
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-27.78%)
Mutual labels:  downloader, spider
Wedge
可配置的小说下载及电子书生成工具
Stars: ✭ 62 (-13.89%)
Mutual labels:  downloader
youtube-dlc
Command-line program to download various media from YouTube.com and other sites
Stars: ✭ 1,225 (+1601.39%)
Mutual labels:  downloader
holacracy constitution
No description or website provided.
Stars: ✭ 14 (-80.56%)
Mutual labels:  document
picidaejs
🐦Picidae is a document generator which has gentle experience.
Stars: ✭ 24 (-66.67%)
Mutual labels:  document
V2EX Spider
V2EX爬虫
Stars: ✭ 21 (-70.83%)
Mutual labels:  spider
SpiderDemo
爬虫Demo,基于Python实现
Stars: ✭ 56 (-22.22%)
Mutual labels:  spider
MovieRatings
Android app to show movie ratings when browsing Netflix, Amazon Prime Video and other supported video streaming apps on the phone
Stars: ✭ 71 (-1.39%)
Mutual labels:  free
instant-music-playlist-downloader
Download MP3 songs from the web.
Stars: ✭ 18 (-75%)
Mutual labels:  downloader
me
A next-gen JAMSTACK for developers that leverage the use of JSON Resume Schema in creating a web based vitae and portfolio. Stop worrying with a complex setup and deployment process. Docs: https://me-docs.now.sh/
Stars: ✭ 48 (-33.33%)
Mutual labels:  free
opendev
OpenDev is a non-profit project that tries to collect as many resources (assets) of free use for the development of video games and applications.
Stars: ✭ 34 (-52.78%)
Mutual labels:  free
wikiradio
A radio for Wikimedia Commons audio files
Stars: ✭ 14 (-80.56%)
Mutual labels:  free

文档下载器

Sync to Gitee Publish to PyPI Publish to TestPyPI Release
version pypi version License

可用于下载book118的PDF文档

思路

  1. 爬虫爬取图片链接
  2. 下载图片
  3. 将图片拼合成pdf文件

相关文章 使用爬虫免费下载book118的PDF文件

参数说明

参数 解释 必备参数
-h--help 显示帮助
-u--url 要下载的文件的网页地址
-o--output 文件保存名,默认是文档的标题.pdf
-p--proxy 设置要使用的代理地址(默认使用环境变量中HTTP_PROXYHTTPS_PROXY设置的值),可以使用-p ''强制设置不走代理
-f--force 强制重新下载,不使用缓存
-t--thread 要使用的线程数,如不指定默认是10
-s--safe 如果被服务器拒绝可以打开此选项,将强制单线程,并增加请求和下载的间隔时间

使用模块

使用已上传到 PyPI 的包

python3 -m pip install documentDownloader

安装完成后即可直接使用 documentDownloader 命令

如:documentDownloader -u https://max.book118.com/html/2020/0109/5301014320002213.shtm -o '单身人群专题研究报告-2019' -p http://127.0.0.1:1080 -f -t 20

直接使用源码中的 main.py

克隆该项目,或在releases页面选择版本下载

  1. 安装Python3
  2. 安装依赖模块(Pillow、reportlab、requests) python -m pip install -r requirements.txt
  3. 使用 python3 main.py 执行

如:python main.py -u https://max.book118.com/html/2020/0109/5301014320002213.shtm -o '单身人群专题研究报告-2019' -p http://127.0.0.1:1080 -f -t 20

仅供学习爬虫及相关知识,请支持正版图书
虽然book118上的好多pdf也是盗版吧

贡献列表

更新

  • 2019-01-29: Book118网站更新,更改对应部分代码. @JodeZer
  • 2020-01-09: 重构代码,增加多线程下载加速,允许使用代理,允许通过已有缓存直接建立pdf,自动识别图片大小生成pdf @OhYee
  • 2020-05-25: 发布到 PyPI
  • 2021-10-18: Book118网站更新,更改部分代码; 设置默认导出pdf的文件名为文档标题; 对无法免费预览全文的文档增加提示; 调整请求间隔为2秒(实测请求间隔小于2秒很可能会返回空地址); 增加"慢速下载"选项,防止下载过快被服务器拒绝。@alxt17
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].