jalan / Pdftotext
Licence: mit
Simple PDF text extraction
Stars: ✭ 445
Programming Languages
python
139335 projects - #7 most used programming language
Labels
Projects that are alternatives of or similar to Pdftotext
Pdfjs
A Portable Document Format (PDF) generation library targeting both the server- and client-side.
Stars: ✭ 395 (-11.24%)
Mutual labels: pdf
Storybook Addon Designs
A Storybook addon that embed Figma, websites, PDF or images in the addon panel.
Stars: ✭ 441 (-0.9%)
Mutual labels: pdf
Xournalpp
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.
Stars: ✭ 5,353 (+1102.92%)
Mutual labels: pdf
One File Pdf
A minimalist Go PDF writer in 1982 lines. Draws text, images and shapes. Helps understand the PDF format. Used in production for reports.
Stars: ✭ 429 (-3.6%)
Mutual labels: pdf
Crowbook
Converts books written in Markdown to HTML, LaTeX/PDF and EPUB
Stars: ✭ 399 (-10.34%)
Mutual labels: pdf
Serverless Libreoffice
Run LibreOffice in AWS Lambda to create PDFs & convert documents
Stars: ✭ 410 (-7.87%)
Mutual labels: pdf
Ocrmypdf
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Stars: ✭ 5,549 (+1146.97%)
Mutual labels: pdf
Pdfpig
Read and extract text and other content from PDFs in C# (port of PdfBox)
Stars: ✭ 391 (-12.13%)
Mutual labels: pdf
Ptshowcaseviewcontroller
An initial implementation of a "showcase" view( controller) for iOS apps... Visualizes images, videos and PDF files beautifully! (by @pittleorg) [meta: image, photo, video, document, pdf, album, gallery, showcase, gallery, iOS, iPhone, iPad, component, library, viewer]
Stars: ✭ 395 (-11.24%)
Mutual labels: pdf
Document Viewer
Document Viewer is a highly customizable document viewer for Android.
Stars: ✭ 415 (-6.74%)
Mutual labels: pdf
Govips
A lightning fast image processing and resizing library for Go
Stars: ✭ 442 (-0.67%)
Mutual labels: pdf
Puppetron
Puppeteer (Headless Chrome Node API)-based rendering solution.
Stars: ✭ 429 (-3.6%)
Mutual labels: pdf
pdftotext
Simple PDF text extraction
import pdftotext
# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
pdf = pdftotext.PDF(f)
# If it's password-protected
with open("secure.pdf", "rb") as f:
pdf = pdftotext.PDF(f, "secret")
# How many pages?
print(len(pdf))
# Iterate over all the pages
for page in pdf:
print(page)
# Read some individual pages
print(pdf[0])
print(pdf[1])
# Read all the text into one string
print("\n\n".join(pdf))
OS Dependencies
These instructions assume you're using Python 3 on a recent OS. Package names may differ for Python 2 or for an older OS.
Debian, Ubuntu, and friends
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
Fedora, Red Hat, and friends
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
macOS
brew install pkg-config poppler python
Windows
Currently tested only when using conda:
- Install the Microsoft Visual C++ Build Tools
- Install poppler through conda:
conda install -c conda-forge poppler
Install
pip install pdftotext
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].