All Projects → ashutoshvarma → pyxpdf

ashutoshvarma / pyxpdf

Licence: other
Fast and memory-efficient Python PDF Parser based on xpdf sources

Programming Languages

cython
566 projects
python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to pyxpdf

pdf2html
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Stars: ✭ 55 (+111.54%)
Mutual labels:  pdf-converter, pdftohtml
pdftron-android-samples
PDFTron Android Samples
Stars: ✭ 30 (+15.38%)
Mutual labels:  pdf-converter, pdftohtml
press-ready
🚀 Make your PDF press-ready PDF/X-1a.
Stars: ✭ 56 (+115.38%)
Mutual labels:  pdf-converter, xpdf
WeReadScan
扫描“微信读书”已购图书并下载本地PDF的爬虫
Stars: ✭ 273 (+950%)
Mutual labels:  pdf-converter
Office2PDF
Office 文件(Word、Excel、PPT)批量转为 PDF 文件,文档完善,自用满意
Stars: ✭ 114 (+338.46%)
Mutual labels:  pdf-converter
covid19-kerala-api-deprecated
Deprecated - A fast API service for retrieving day to day stats about Coronavirus(COVID-19, SARS-CoV-2) outbreak in Kerala(India).
Stars: ✭ 14 (-46.15%)
Mutual labels:  pdftotext
sypht-golang-client
A Golang client for the Sypht API
Stars: ✭ 33 (+26.92%)
Mutual labels:  pdf-parser
Docotic.Pdf.Samples
C# and VB.NET samples for Docotic.Pdf library
Stars: ✭ 52 (+100%)
Mutual labels:  pdf-parser
chromic pdf
Convenient HTML to PDF/A rendering library for Elixir based on Chrome & Ghostscript
Stars: ✭ 196 (+653.85%)
Mutual labels:  pdf-converter
pdf-to-text
Read pdf files on javascript
Stars: ✭ 62 (+138.46%)
Mutual labels:  pdftotext
Android-XML-to-PDF-Generator
This library is for convert XML to PDF very easily using Step Builders Pattern
Stars: ✭ 140 (+438.46%)
Mutual labels:  pdf-converter
scipdf parser
Python PDF parser for scientific publications
Stars: ✭ 76 (+192.31%)
Mutual labels:  pdf-parser
PdfToImage
Convert PDF To jpg in c# (using PdfiumViewer)
Stars: ✭ 23 (-11.54%)
Mutual labels:  pdf-converter
pdf2jpg
Utility to convert PDF into JPG files
Stars: ✭ 39 (+50%)
Mutual labels:  pdf-converter
php-chrome-html2pdf
A PHP library for converting HTML to PDF using Google Chrome
Stars: ✭ 53 (+103.85%)
Mutual labels:  pdf-converter
hpdft
tools to poke pdf using haskell
Stars: ✭ 36 (+38.46%)
Mutual labels:  pdf-parser
document-conversion-nodejs
DEPRECATED: Please use https://github.com/watson-developer-cloud/discovery-nodejs
Stars: ✭ 27 (+3.85%)
Mutual labels:  pdf-converter
PDFParser
Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser
Stars: ✭ 25 (-3.85%)
Mutual labels:  pdf-parser
linkedin-pdf-resume-parser
Parse LinkedIn PDF Resume and extract out name, email, education and work experiences.
Stars: ✭ 22 (-15.38%)
Mutual labels:  pdf-parser
gotenberg-js-client
A simple JS/TS client for interacting with a Gotenberg API
Stars: ✭ 90 (+246.15%)
Mutual labels:  pdf-converter

pyxpdf

pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.

docs Read the Docs
tests Azure DevOps builds (branch) Travis (.com) Codecov
package PyPI PyPI - Python Version PyPI - Wheel PyPI - Downloads
license GitHub

Features

  • Almost x20 times faster than pure python based pdf parsers (see Speed Comparison)
  • Extract text while maintaining original document layout (best possible)
  • Support almost all PDF encodings, CMaps and predefined CMaps.
  • Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
  • Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
  • No explict dependencies (except optional ones, see Installation)
  • Thread Safe

More Information

License

pyxpdf is licensed under the GNU General Public License (GPL), version 2 or 3. See the LICENSE

Credits

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].