Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → gagayuan → Runoob Pdf

gagayuan / Runoob Pdf

爬取菜鸟教程网站并转PDF__python_crawer_by_chrome

Programming Languages

python

139335 projects - #7 most used programming language

python3

1442 projects

Labels

crawler pdf-generation toc

Projects that are alternatives of or similar to Runoob Pdf

Iclr2019 Openreviewdata

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

Stars: ✭ 376 (-12.56%)

Mutual labels: crawler

Mmjpg

👩 美女写真套图爬虫（一）

Stars: ✭ 398 (-7.44%)

Mutual labels: crawler

Printpdf

An easy-to-use library for writing PDF in Rust

Stars: ✭ 404 (-6.05%)

Mutual labels: pdf-generation

Signature algorithm

各种App、小程序、网站的请求签名或加密算法。现已有：自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

Stars: ✭ 380 (-11.63%)

Mutual labels: crawler

Pdfjs

A Portable Document Format (PDF) generation library targeting both the server- and client-side.

Stars: ✭ 395 (-8.14%)

Mutual labels: pdf-generation

Newpipeextractor

Core part of NewPipe

Stars: ✭ 400 (-6.98%)

Mutual labels: crawler

Spider Flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

Stars: ✭ 365 (-15.12%)

Mutual labels: crawler

Comicbook

本项目不再维护，详情可加群了解 https://t.me/onecomicbook

Stars: ✭ 429 (-0.23%)

Mutual labels: crawler

Snappy

PHP library allowing thumbnail, snapshot or PDF generation from a url or a html page. Wrapper for wkhtmltopdf/wkhtmltoimage

Stars: ✭ 3,986 (+826.98%)

Mutual labels: pdf-generation

Opensearchserver

Open-source Enterprise Grade Search Engine Software

Stars: ✭ 408 (-5.12%)

Mutual labels: crawler

Bilili

🍻 bilibili video (including bangumi) and danmaku downloader | B站视频（含番剧）、弹幕下载器

Stars: ✭ 379 (-11.86%)

Mutual labels: crawler

Github Markdown Toc.go

Easy TOC creation for GitHub README.md (in go)

Stars: ✭ 387 (-10%)

Mutual labels: toc

Rst2pdf

Use a text editor. Make a PDF.

Stars: ✭ 404 (-6.05%)

Mutual labels: pdf-generation

Md To Pdf

Hackable CLI tool for converting Markdown files to PDF using Node.js and headless Chrome.

Stars: ✭ 374 (-13.02%)

Mutual labels: pdf-generation

Dotcommon

What do people have in their dotfiles?

Stars: ✭ 418 (-2.79%)

Mutual labels: crawler

Netease Music Cracker

🎵 将可下载的网易云音乐的缓存文件转换为 MP3 文件

Stars: ✭ 373 (-13.26%)

Mutual labels: crawler

Gosint

OSINT Swiss Army Knife

Stars: ✭ 401 (-6.74%)

Mutual labels: crawler

Vim Markdown Toc

A vim 7.4+ plugin to generate table of contents for Markdown files.

Stars: ✭ 427 (-0.7%)

Mutual labels: toc

Iclr2020 Openreviewdata

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

Stars: ✭ 426 (-0.93%)

Mutual labels: crawler

Contents

Table of contents generator.

Stars: ✭ 404 (-6.05%)

Mutual labels: toc

View All Similar Projects ➔

功能

此脚本用来下载runoob教程为pdf文件,可用来给学习者打印或者离线学习.pdf文件已经下载至 runoob 文件夹.若想下载至您的本地,请运行 python3 runoob_crawl.py

您可设置pdf内样式

在clean.js设置html的字体,宽度,样式,再保存到pdf

运行时会包含的错误:

ERROR:gpu_process_transport_factory.cc(967)] Lost UI shared context 这是chrome内在一个小bug,新版本已经修复.

html网页转换至pdf的一些尝试:

用selenium只能下载一张长图pdf,很不完美.
用phantomjs可以下载文字可选中的pdf,但是不能分页,pdf高度也难以设置
google-chrome --print-to-pdf 保存pdf非常好用,pdf会自动分页.
用merge_pdf_with_toc.py来合并pdf,并可添加TOC,非常强大.(参考的国外牛人)
谢谢@flyfreeme的提醒，导致消失的原因是jquery，加了行sed -i '/<script.jquery.</script>/d' ./full_page/$page_pr.html解决了。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 430

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗