All Projects → zTrix → Webpage2html

zTrix / Webpage2html

Licence: other
save/convert web pages to a standalone editable html file for offline archive/view/edit/play/whatever

Projects that are alternatives of or similar to Webpage2html

Autosteer ESP
Advanced Autosteer Sketch for ESP32 - WiFi Version
Stars: ✭ 34 (-89.47%)
Mutual labels:  webpage
pageshot
Pageshot as a service.
Stars: ✭ 45 (-86.07%)
Mutual labels:  webpage
AG NTRIP ESP
AG Rooftop controller with NTRIP client and IMU (ESP32 Controller)
Stars: ✭ 25 (-92.26%)
Mutual labels:  webpage
selenified
The Selenified Test Framework provides mechanisms for simply testing applications at multiple tiers while easily integrating into DevOps build environments. Selenified provides traceable reporting for both web and API testing, wraps and extends Selenium calls to more appropriately handle testing errors, and supports testing over multiple browser…
Stars: ✭ 38 (-88.24%)
Mutual labels:  webpage
Find-Me-Issues
A React.js based web-app to find repositories containing 'good first issues' open source contribution. Any kind of contribution and suggestions are highly appreciated!
Stars: ✭ 29 (-91.02%)
Mutual labels:  webpage
BPDownloadsGUI
A easy to use Downloader UI
Stars: ✭ 26 (-91.95%)
Mutual labels:  webpage
html2biblatex
A tiny bookmarklet for exporting web pages to BibLaTeX (all browsers / no installation).
Stars: ✭ 73 (-77.4%)
Mutual labels:  webpage
spoti-vote
Web application to vote the next Song in Spotify Queue
Stars: ✭ 14 (-95.67%)
Mutual labels:  webpage
The-HTML-and-CSS-Workshop
A New, Interactive Approach to Learning HTML and CSS
Stars: ✭ 65 (-79.88%)
Mutual labels:  webpage
ulboracms
Ulbora CMS is a self-contained CMS (no database needed) written in Golang. It uses a JSON datastore with content saved in both json files and in memory. You can download and upload a single binary backup file containing content, images, and templates as needed. It also has a built-in mail sender.
Stars: ✭ 42 (-87%)
Mutual labels:  webpage
EasyUI
ESP8266 User Interface Library.
Stars: ✭ 63 (-80.5%)
Mutual labels:  webpage
Mimo-Crawler
A web crawler that uses Firefox and js injection to interact with webpages and crawl their content, written in nodejs.
Stars: ✭ 22 (-93.19%)
Mutual labels:  webpage
sample html
Personal website
Stars: ✭ 19 (-94.12%)
Mutual labels:  webpage
Colour-Learning
A simple machine learning webpage that understands & changes the text 🌈 according to the background.
Stars: ✭ 21 (-93.5%)
Mutual labels:  webpage
img-cli
An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL
Stars: ✭ 15 (-95.36%)
Mutual labels:  webpage
webgrep
Grep Web pages with extra features like JS deobfuscation and OCR
Stars: ✭ 86 (-73.37%)
Mutual labels:  webpage
web-clipper
Easily download the main content of a web page in html, markdown, and/or epub format from command line.
Stars: ✭ 15 (-95.36%)
Mutual labels:  webpage
Balena Dash
Build a Raspberry Pi based desktop dashboard for stats, photos, videos and more!
Stars: ✭ 292 (-9.6%)
Mutual labels:  webpage
urlbox-screenshots-node
Capture website thumbnails using the urlbox.io screenshot as a service API in node
Stars: ✭ 14 (-95.67%)
Mutual labels:  webpage
Dom
Modern DOM API.
Stars: ✭ 88 (-72.76%)
Mutual labels:  webpage

Webpage2html

Build Status

Webpage2html: Save web page to a single html file

This is a simple script to save a web page to a single html file. No mhtml or pdf stuff, no xxx_files directory, just one single readable and editable html file.

The basic idea is to insert all css/javascript files into html directly, and use base64 data URI for image data.

Usage and Example

Save web page directly from url (recommended way):

$ python webpage2html.py https://www.google.com > google.html

or save web page first using browsers such as Chrome, to something.html with something_files directory beside.

$ python webpage2html.py /path/to/something.html > something_single.html

But note that the second method may not always work as expected, because there may be urls like //ssl.gstatic.com/gb/images/v1_c69d5271.png (from google index page), but the file is missing in Google_files directory saved by browsers.

Enable javascript, for example, save 2048 game page into a single html for offline playing

$ python webpage2html.py -s http://gabrielecirulli.github.io/2048/ > 2048.html

Dependency

BeautifulSoup4, lxml, termcolor(optional)

$ pip install -r requirements.txt

or install them manually

$ pip install lxml BeautifulSoup4 requests termcolor

I have tried the default HTMLParser and html5lib as the backend parser for BeautifulSoup, but both of them are buggy, HTMLParser handles self closing tags (like <br> <meta>) incorrectly(it will wait for closing tag for <br>, so If too many <br> tags exist in the html, BeautifulSoup will complain RuntimeError: maximum recursion depth exceeded), and html5lib will encode encoded html entities such as &lt; again to &amp;lt;, which is definitly unacceptable. I have tested many cases, and lxml works perfectly, so I choose to use lxml now.

The termcolor package is for colored log output support if you like.

Unsupported Cases

browser side less compiling

The page embeds less css directly and use less.js to compile in browser. In this case, I still cannot find a way to embed the less code into generated html to make it work.

<link rel="stylesheet/less" type="text/css" href="http://dghubble.com/blog/theme/css/style.less">
<script src="http://dghubble.com/blog/theme/js/less-1.5.0.min.js" type="text/javascript"></script>

srcset attribute in img tag (html5)

Currently srcset is discarded.

Contributors

  1. lukin.a.i submitted a patch to fix not recognised css link (rel=stylesheet) issue
  2. Gruber.
  3. Java port of this project. https://github.com/cedricblondeau/webpage2html-java
  4. https://github.com/presto8

License

webpage2html use SATA License (Star And Thank Author License), so you have to star this project before using. Read the license carefully.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].