All Projects → gagayuan → Runoob Pdf

gagayuan / Runoob Pdf

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Runoob Pdf

Iclr2019 Openreviewdata
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Stars: ✭ 376 (-12.56%)
Mutual labels:  crawler
Mmjpg
👩 美女写真套图爬虫(一)
Stars: ✭ 398 (-7.44%)
Mutual labels:  crawler
Printpdf
An easy-to-use library for writing PDF in Rust
Stars: ✭ 404 (-6.05%)
Mutual labels:  pdf-generation
Signature algorithm
各种App、小程序、网站的请求签名或加密算法。 现已有:自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)
Stars: ✭ 380 (-11.63%)
Mutual labels:  crawler
Pdfjs
A Portable Document Format (PDF) generation library targeting both the server- and client-side.
Stars: ✭ 395 (-8.14%)
Mutual labels:  pdf-generation
Newpipeextractor
Core part of NewPipe
Stars: ✭ 400 (-6.98%)
Mutual labels:  crawler
Spider Flow
新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。
Stars: ✭ 365 (-15.12%)
Mutual labels:  crawler
Comicbook
本项目不再维护,详情可加群了解 https://t.me/onecomicbook
Stars: ✭ 429 (-0.23%)
Mutual labels:  crawler
Snappy
PHP library allowing thumbnail, snapshot or PDF generation from a url or a html page. Wrapper for wkhtmltopdf/wkhtmltoimage
Stars: ✭ 3,986 (+826.98%)
Mutual labels:  pdf-generation
Opensearchserver
Open-source Enterprise Grade Search Engine Software
Stars: ✭ 408 (-5.12%)
Mutual labels:  crawler
Bilili
🍻 bilibili video (including bangumi) and danmaku downloader | B站视频(含番剧)、弹幕下载器
Stars: ✭ 379 (-11.86%)
Mutual labels:  crawler
Github Markdown Toc.go
Easy TOC creation for GitHub README.md (in go)
Stars: ✭ 387 (-10%)
Mutual labels:  toc
Rst2pdf
Use a text editor. Make a PDF.
Stars: ✭ 404 (-6.05%)
Mutual labels:  pdf-generation
Md To Pdf
Hackable CLI tool for converting Markdown files to PDF using Node.js and headless Chrome.
Stars: ✭ 374 (-13.02%)
Mutual labels:  pdf-generation
Dotcommon
What do people have in their dotfiles?
Stars: ✭ 418 (-2.79%)
Mutual labels:  crawler
Netease Music Cracker
🎵 将可下载的网易云音乐的缓存文件转换为 MP3 文件
Stars: ✭ 373 (-13.26%)
Mutual labels:  crawler
Gosint
OSINT Swiss Army Knife
Stars: ✭ 401 (-6.74%)
Mutual labels:  crawler
Vim Markdown Toc
A vim 7.4+ plugin to generate table of contents for Markdown files.
Stars: ✭ 427 (-0.7%)
Mutual labels:  toc
Iclr2020 Openreviewdata
Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.
Stars: ✭ 426 (-0.93%)
Mutual labels:  crawler
Contents
Table of contents generator.
Stars: ✭ 404 (-6.05%)
Mutual labels:  toc

功能

此脚本用来下载runoob教程为pdf文件,可用来给学习者打印或者离线学习.pdf文件已经下载至 runoob 文件夹.若想下载至您的本地,请运行 python3 runoob_crawl.py

您可设置pdf内样式

在clean.js设置html的字体,宽度,样式,再保存到pdf

运行时会包含的错误:

ERROR:gpu_process_transport_factory.cc(967)] Lost UI shared context 这是chrome内在一个小bug,新版本已经修复.

html网页转换至pdf的一些尝试:

  • 用selenium只能下载一张长图pdf,很不完美.
  • 用phantomjs可以下载文字可选中的pdf,但是不能分页,pdf高度也难以设置
  • google-chrome --print-to-pdf 保存pdf非常好用,pdf会自动分页.
  • merge_pdf_with_toc.py来合并pdf,并可添加TOC,非常强大.(参考的国外牛人)
  • 谢谢@flyfreeme的提醒,导致消失的原因是jquery,加了行sed -i '/<script.jquery.</script>/d' ./full_page/$page_pr.html解决了。
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].