Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.

Stars: ✭ 5,353 (+1102.92%)

Mutual labels: pdf

Pdfvuer

A PDF viewer for Vue using Mozilla's PDF.js

Stars: ✭ 443 (-0.45%)

Mutual labels: pdf

Tabulizer

Bindings for Tabula PDF Table Extractor Library

Stars: ✭ 413 (-7.19%)

Mutual labels: pdf

One File Pdf

A minimalist Go PDF writer in 1982 lines. Draws text, images and shapes. Helps understand the PDF format. Used in production for reports.

Stars: ✭ 429 (-3.6%)

Mutual labels: pdf

Crowbook

Converts books written in Markdown to HTML, LaTeX/PDF and EPUB

Stars: ✭ 399 (-10.34%)

Mutual labels: pdf

Serverless Libreoffice

Run LibreOffice in AWS Lambda to create PDFs & convert documents

Stars: ✭ 410 (-7.87%)

Mutual labels: pdf

Ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Stars: ✭ 5,549 (+1146.97%)

Mutual labels: pdf

Pdfpig

Read and extract text and other content from PDFs in C# (port of PdfBox)

Stars: ✭ 391 (-12.13%)

Mutual labels: pdf

Prawn

Fast, Nimble PDF Writer for Ruby

Stars: ✭ 4,266 (+858.65%)

Mutual labels: pdf

Ptshowcaseviewcontroller

An initial implementation of a "showcase" view( controller) for iOS apps... Visualizes images, videos and PDF files beautifully! (by @pittleorg) [meta: image, photo, video, document, pdf, album, gallery, showcase, gallery, iOS, iPhone, iPad, component, library, viewer]

Stars: ✭ 395 (-11.24%)

Mutual labels: pdf

Document Viewer

Document Viewer is a highly customizable document viewer for Android.

Stars: ✭ 415 (-6.74%)

Mutual labels: pdf

Tppdf

TPPDF is a simple-to-use PDF builder for iOS

Stars: ✭ 444 (-0.22%)

Mutual labels: pdf

Govips

A lightning fast image processing and resizing library for Go

Stars: ✭ 442 (-0.67%)

Mutual labels: pdf

Puppetron

Puppeteer (Headless Chrome Node API)-based rendering solution.

Stars: ✭ 429 (-3.6%)

Mutual labels: pdf

View All Similar Projects ➔

pdftotext

Simple PDF text extraction

import pdftotext

# Load your PDF
with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# If it's password-protected
with open("secure.pdf", "rb") as f:
    pdf = pdftotext.PDF(f, "secret")

# How many pages?
print(len(pdf))

# Iterate over all the pages
for page in pdf:
    print(page)

# Read some individual pages
print(pdf[0])
print(pdf[1])

# Read all the text into one string
print("\n\n".join(pdf))

OS Dependencies

These instructions assume you're using Python 3 on a recent OS. Package names may differ for Python 2 or for an older OS.

Debian, Ubuntu, and friends

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

macOS

brew install pkg-config poppler python

Windows

Currently tested only when using conda:

Install the Microsoft Visual C++ Build Tools
Install poppler through conda:
```
conda install -c conda-forge poppler
```

Install

pip install pdftotext

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 445

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (16) 🔗