gkovacs / Pdfocr
Licence: mit
Adds text to PDF files using the cuneiform OCR software
Stars: ✭ 287
Programming Languages
ruby
36898 projects - #4 most used programming language
Projects that are alternatives of or similar to Pdfocr
Mybox
Easy tools of document, image, file, network, location, color, and media.
Stars: ✭ 45 (-84.32%)
Mutual labels: pdf, ocr
Scanbot Sdk Example Android
Document scanning SDK example apps for the Scanbot SDK for Android.
Stars: ✭ 67 (-76.66%)
Mutual labels: pdf, ocr
Papermerge
Open Source Document Management System for Digital Archives (Scanned Documents)
Stars: ✭ 1,177 (+310.1%)
Mutual labels: pdf, ocr
Paperwork
Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
Stars: ✭ 2,392 (+733.45%)
Mutual labels: pdf, ocr
Ocrmypdf
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Stars: ✭ 5,549 (+1833.45%)
Mutual labels: pdf, ocr
Remarks
Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-67.25%)
Mutual labels: pdf, ocr
Docspell
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
Stars: ✭ 303 (+5.57%)
Mutual labels: pdf, ocr
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (-42.51%)
Mutual labels: pdf, ocr
Pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Stars: ✭ 1,969 (+586.06%)
Mutual labels: pdf, ocr
Lambda Text Extractor
AWS Lambda functions to extract text from various binary formats.
Stars: ✭ 159 (-44.6%)
Mutual labels: pdf, ocr
Open Paperless
Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
Stars: ✭ 2,538 (+784.32%)
Mutual labels: pdf, ocr
Mayan Edms
Free Open Source Document Management System (mirror, no pull request or issues)
Stars: ✭ 226 (-21.25%)
Mutual labels: pdf, ocr
Parsr
Transforms PDF, Documents and Images into Enriched Structured Data
Stars: ✭ 2,736 (+853.31%)
Mutual labels: pdf, ocr
Pdftilecut
pdftilecut lets you sub-divide a PDF page(s) into smaller pages so you can print them on small form printers.
Stars: ✭ 258 (-10.1%)
Mutual labels: pdf
Cloud Reports
Scans your AWS cloud resources and generates reports. Check out free hosted version:
Stars: ✭ 255 (-11.15%)
Mutual labels: pdf
Attention ocr.pytorch
This repository implements the the encoder and decoder model with attention model for OCR
Stars: ✭ 278 (-3.14%)
Mutual labels: ocr
pdfocr
pdfocr adds an OCR text layer to scanned PDF files, allowing them to be searched. It currently depends on Ruby 1.8.7 or above, and uses ocropus, cuneiform, or tesseract for performing OCR.
Using
To use, run:
pdfocr -i input.pdf -o output.pdf
For more details, see the manpage.
Dependencies
pdfocr requires tesseract and hocr2pdf. These can be provided by installing the packages tesseract-ocr, tesseract-ocr-eng (or other languages you need), and exactimage from your distribution.
Credits
pdfocr was written by Geza Kovacs
pdfocr is hosted at http://github.com/gkovacs/pdfocr
Christian Pietsch added tesseract support.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].