All Projects → pdf2htmlEX → Pdf2htmlex

pdf2htmlEX / Pdf2htmlex

Licence: other
Convert PDF to HTML without losing text or format.

Projects that are alternatives of or similar to Pdf2htmlex

Qpdf
PDF viewer widget for Qt
Stars: ✭ 111 (-76.48%)
Mutual labels:  pdf, pdf-viewer
React Pdf Highlighter
Set of React components for PDF annotation
Stars: ✭ 448 (-5.08%)
Mutual labels:  pdf, pdf-viewer
Vue Pdf
vue.js pdf viewer
Stars: ✭ 1,700 (+260.17%)
Mutual labels:  pdf, pdf-viewer
Ng2 Pdf Viewer
📄 PDF Viewer Component for Angular 5+
Stars: ✭ 997 (+111.23%)
Mutual labels:  pdf, pdf-viewer
Xournalpp
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.
Stars: ✭ 5,353 (+1034.11%)
Mutual labels:  pdf, pdf-viewer
Jfbview
PDF and image viewer for the Linux framebuffer.
Stars: ✭ 78 (-83.47%)
Mutual labels:  pdf, pdf-viewer
React Pdf
Display PDFs in your React app as easily as if they were images.
Stars: ✭ 5,320 (+1027.12%)
Mutual labels:  pdf, pdf-viewer
Pdfview Android
Small Android library to show PDF files
Stars: ✭ 132 (-72.03%)
Mutual labels:  pdf, pdf-viewer
Pdf Flipbook
Browse PDF document like a book turning its pages
Stars: ✭ 279 (-40.89%)
Mutual labels:  pdf, pdf-viewer
React Native Pdfview
📚 PDF viewer for React Native
Stars: ✭ 198 (-58.05%)
Mutual labels:  pdf, pdf-viewer
Buka
Buka is a modern software that helps you manage your ebook at ease.
Stars: ✭ 896 (+89.83%)
Mutual labels:  pdf, pdf-viewer
Pdfvuer
A PDF viewer for Vue using Mozilla's PDF.js
Stars: ✭ 443 (-6.14%)
Mutual labels:  pdf, pdf-viewer
Sumatrapdf
SumatraPDF reader
Stars: ✭ 7,462 (+1480.93%)
Mutual labels:  pdf, pdf-viewer
Flutter plugin pdf viewer
A flutter plugin for handling PDF files. Works on both Android & iOS
Stars: ✭ 81 (-82.84%)
Mutual labels:  pdf, pdf-viewer
React Pdf Js
A React component to wrap PDF.js
Stars: ✭ 489 (+3.6%)
Mutual labels:  pdf, pdf-viewer
Cordova Plugin Document Viewer
A Document Viewer cordova/phonegap plugin for iOS, Android and Windows
Stars: ✭ 168 (-64.41%)
Mutual labels:  pdf, pdf-viewer
Document Viewer
Document Viewer is a highly customizable document viewer for Android.
Stars: ✭ 415 (-12.08%)
Mutual labels:  pdf, pdf-viewer
Pdfh5
web/h5/移动端PDF预览插件
Stars: ✭ 423 (-10.38%)
Mutual labels:  pdf, pdf-viewer
Prawn
Fast, Nimble PDF Writer for Ruby
Stars: ✭ 4,266 (+803.81%)
Mutual labels:  pdf
Rust Skia
Safe Skia Bindings for Rust
Stars: ✭ 450 (-4.66%)
Mutual labels:  pdf

pdf2htmlEX

Build Status

Differences from upstream pdf2htmlEX:

This is my branch of pdf2htmlEX which aims to allow an open collaboration to help keep the project active. A number of changes and improvements have been incorporated from other forks:

  • Lots of bugs fixes, mostly of edge cases
  • Integration of latest Cairo code
  • Out of source building
  • Rewritten handling of obscured/partially obscured text - now much more accurate
  • Some support for transparent text
  • Improvement of DPI settings - clamping of DPI to ensure output graphic isn't too big

--correct-text-visibility tracks the visibility of 4 sample points for each character (currently the 4 corners of the character's bounding box, inset slightly) to determine visibility. It now has two modes. 1 = Fully occluded text handled (i.e. doesn't get put into the HTML layer). 2 = Partially occluded text handled.

The default is now "1", so fully occluded text should no longer show through. If "2" is selected then if the character is partially occluded it will be drawn in the background layer. In this case, the rendered DPI of the page will be automatically increased to --covered-text-dpi (default: 300) to reduce the impact of rasterized text.

For maximum accuracy I strongly recommend using the output options: --font-size-multiplier 1 --zoom 25. This will circumvent rounding errors inside web browsers. You will then have to scale down the resulting HTML page using an appropriate "scale" transform.

If you are concerned about file size of the resulting HTML, then I recommend patching fontforge to prevent it writing the current time into the dumped fonts, and then post-process the pdf2htmlEX data to remove duplicate files - there will usually be many duplicate background images and fonts.

一图胜千言
A beautiful demo is worth a thousand words

  • Bible de Genève, 1564 (fonts and typography): HTML / PDF
  • Cheat Sheet (math formulas): HTML / PDF
  • Scientific Paper (text and figures): HTML / PDF
  • Full Circle Magazine (read while downloading): HTML / PDF
  • Git Manual (CJK support): HTML / PDF

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. Academic papers with lots of formulas and figures? Magazines with complicated layouts? No problem!

pdf2htmlEX is also an online publishing tool which is flexible for many different use cases.

Learn more about who and why should use pdf2htmlEX.

Features

  • Native HTML text with precise font and location.
  • Flexible output: all-in-one HTML or on demand page loading (needs JavaScript).
  • Moderate file size, sometimes even smaller than PDF.
  • Supporting links, outlines (bookmarks), printing, SVG background, Type 3 fonts and more...

Compare to others

Portals

LICENSE

pdf2htmlEX, as a whole package, is licensed under GPLv3+. Some resource files are released with relaxed licenses, read LICENSE for more details.

Acknowledgements

pdf2htmlEX is made possible thanks to the following projects:

Testing Powered By SauceLabs

pdf2htmlEX is inspired by the following projects:

  • pdftohtml from poppler
  • MuPDF
  • PDF.js
  • Crocodoc
  • Google Doc

Special Thanks

  • Hongliang Tian
  • Wanmin Liu
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].