All Projects → jalan → Pdftotext

jalan / Pdftotext

Licence: mit
Simple PDF text extraction

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Pdftotext

Pdfjs
A Portable Document Format (PDF) generation library targeting both the server- and client-side.
Stars: ✭ 395 (-11.24%)
Mutual labels:  pdf
Sonatamediabundle
Symfony SonataMediaBundle
Stars: ✭ 415 (-6.74%)
Mutual labels:  pdf
Storybook Addon Designs
A Storybook addon that embed Figma, websites, PDF or images in the addon panel.
Stars: ✭ 441 (-0.9%)
Mutual labels:  pdf
Common.utility
Various helper class
Stars: ✭ 4,101 (+821.57%)
Mutual labels:  pdf
Printpdf
An easy-to-use library for writing PDF in Rust
Stars: ✭ 404 (-9.21%)
Mutual labels:  pdf
Pdfh5
web/h5/移动端PDF预览插件
Stars: ✭ 423 (-4.94%)
Mutual labels:  pdf
Xournalpp
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.
Stars: ✭ 5,353 (+1102.92%)
Mutual labels:  pdf
Pdfvuer
A PDF viewer for Vue using Mozilla's PDF.js
Stars: ✭ 443 (-0.45%)
Mutual labels:  pdf
Tabulizer
Bindings for Tabula PDF Table Extractor Library
Stars: ✭ 413 (-7.19%)
Mutual labels:  pdf
One File Pdf
A minimalist Go PDF writer in 1982 lines. Draws text, images and shapes. Helps understand the PDF format. Used in production for reports.
Stars: ✭ 429 (-3.6%)
Mutual labels:  pdf
Crowbook
Converts books written in Markdown to HTML, LaTeX/PDF and EPUB
Stars: ✭ 399 (-10.34%)
Mutual labels:  pdf
Serverless Libreoffice
Run LibreOffice in AWS Lambda to create PDFs & convert documents
Stars: ✭ 410 (-7.87%)
Mutual labels:  pdf
Ocrmypdf
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Stars: ✭ 5,549 (+1146.97%)
Mutual labels:  pdf
Pdfpig
Read and extract text and other content from PDFs in C# (port of PdfBox)
Stars: ✭ 391 (-12.13%)
Mutual labels:  pdf
Prawn
Fast, Nimble PDF Writer for Ruby
Stars: ✭ 4,266 (+858.65%)
Mutual labels:  pdf
Ptshowcaseviewcontroller
An initial implementation of a "showcase" view( controller) for iOS apps... Visualizes images, videos and PDF files beautifully! (by @pittleorg) [meta: image, photo, video, document, pdf, album, gallery, showcase, gallery, iOS, iPhone, iPad, component, library, viewer]
Stars: ✭ 395 (-11.24%)
Mutual labels:  pdf
Document Viewer
Document Viewer is a highly customizable document viewer for Android.
Stars: ✭ 415 (-6.74%)
Mutual labels:  pdf
Tppdf
TPPDF is a simple-to-use PDF builder for iOS
Stars: ✭ 444 (-0.22%)
Mutual labels:  pdf
Govips
A lightning fast image processing and resizing library for Go
Stars: ✭ 442 (-0.67%)
Mutual labels:  pdf
Puppetron
Puppeteer (Headless Chrome Node API)-based rendering solution.
Stars: ✭ 429 (-3.6%)
Mutual labels:  pdf

pdftotext

PyPI Status Travis Status Azure Status AppVeyor status Coverage Status Downloads

Simple PDF text extraction

import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
    pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
    print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))

OS Dependencies

These instructions assume you're using Python 3 on a recent OS. Package names may differ for Python 2 or for an older OS.

Debian, Ubuntu, and friends

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

macOS

brew install pkg-config poppler python

Windows

Currently tested only when using conda:

  • Install the Microsoft Visual C++ Build Tools
  • Install poppler through conda:
    conda install -c conda-forge poppler
    

Install

pip install pdftotext
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].