Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

Stars: ✭ 165 (-45.54%)

Mutual labels: pdf, ocr

Ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Stars: ✭ 5,549 (+1731.35%)

Mutual labels: pdf, ocr

Lambda Text Extractor

AWS Lambda functions to extract text from various binary formats.

Stars: ✭ 159 (-47.52%)

Mutual labels: pdf, ocr

Open Paperless

Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)

Stars: ✭ 2,538 (+737.62%)

Mutual labels: pdf, ocr

Ambar

🔍 Ambar: Document Search Engine

Stars: ✭ 1,829 (+503.63%)

Mutual labels: pdf, ocr

Pdfocr

Adds text to PDF files using the cuneiform OCR software

Stars: ✭ 287 (-5.28%)

Mutual labels: pdf, ocr

Mayan Edms

Repository mirror of GtLab: https://gitlab.com/mayan-edms/mayan-edms Please use the upstream repository for issues and pull requests.

Stars: ✭ 398 (+31.35%)

Mutual labels: document-management, ocr

Remarks

Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG

Stars: ✭ 94 (-68.98%)

Mutual labels: pdf, ocr

Lodestone

Personal Document Archiving (DMS, EDMS for Personal/Home Office use)

Stars: ✭ 426 (+40.59%)

Mutual labels: document-management, ocr

Pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Stars: ✭ 1,969 (+549.83%)

Mutual labels: pdf, ocr

i-librarian-free

I, Librarian - open-source version of a PDF managing SaaS.

Stars: ✭ 110 (-63.7%)

Mutual labels: ocr, document-management

FileBasedMiniDMS

This php script sorts your documents (by using hardlinks) into subfolders based on the hashtags it finds in your documents filenames.

Stars: ✭ 35 (-88.45%)

Mutual labels: ocr, document-management

idcardocr

离线环境下第二代居民身份证信息识别

Stars: ✭ 358 (+18.15%)

Mutual labels: ocr

Attention ocr.pytorch

This repository implements the the encoder and decoder model with attention model for OCR

Stars: ✭ 278 (-8.25%)

Mutual labels: ocr

meltsub

Convert hardsub to softsub

Stars: ✭ 19 (-93.73%)

Mutual labels: ocr

smart-docs-parser

An OCR based document parser to extract information from identity document images

Stars: ✭ 14 (-95.38%)

Mutual labels: ocr

Redux Offline Docs

Redux documentation in PDF, ePub and MOBI formats for offline reading.

Stars: ✭ 292 (-3.63%)

Mutual labels: pdf

Starter Book

A book starter to kickstart your writing journey 🎉

Stars: ✭ 277 (-8.58%)

Mutual labels: pdf

CTC-OCR

A TensorFlow implementation of hybird CNN-LSTM model with CTC loss for OCR problem

Stars: ✭ 27 (-91.09%)

Mutual labels: ocr

MillionHeros

Android直播答题助手，支持全部答题APP，百万英雄/百万赢家/冲顶大会/芝士超人

Stars: ✭ 23 (-92.41%)

Mutual labels: ocr

Boxable

Boxable is a library that can be used to easily create tables in pdf documents.

Stars: ✭ 253 (-16.5%)

Mutual labels: pdf

Libmergepdf

PHP library for merging multiple PDFs

Stars: ✭ 282 (-6.93%)

Mutual labels: pdf

attentionocr

Attention OCR in Tensorflow 2.0

Stars: ✭ 45 (-85.15%)

Mutual labels: ocr

Pdf

数据科学方向课件&资料

Stars: ✭ 293 (-3.3%)

Mutual labels: pdf

BasicArabicOCR

A very basic Arabic OCR based on tesseract OCR engine written in Java.

Stars: ✭ 19 (-93.73%)

Mutual labels: ocr

Quickbill

Create unlimited invoices for free.

Stars: ✭ 278 (-8.25%)

Mutual labels: pdf

breach-protocol-autosolver

Solve breach protocol minigame in second(s). Windows/Linux/GeForce Now/Google Stadia. Every language.

Stars: ✭ 28 (-90.76%)

Mutual labels: ocr

Notebook As Pdf

Save Jupyter Notebooks as PDF

Stars: ✭ 290 (-4.29%)

Mutual labels: pdf

namsel

An OCR application focused on machine-print Tibetan text

Stars: ✭ 22 (-92.74%)

Mutual labels: ocr

Hummusrecipe

A powerful PDF tool for NodeJS based on HummusJS.

Stars: ✭ 274 (-9.57%)

Mutual labels: pdf

Python Automation Scripts

Simple yet powerful automation stuffs.

Stars: ✭ 292 (-3.63%)

Mutual labels: pdf

solr-ocrpayload-plugin

Efficient indexing and retrieval of OCR bounding boxes in Solr

Stars: ✭ 22 (-92.74%)

Mutual labels: ocr

Thinreports Generator

Report Generator for Ruby

Stars: ✭ 268 (-11.55%)

Mutual labels: pdf

car-OCR

基于机器学习和OCR的车牌识别系统 @fujunhao

Stars: ✭ 39 (-87.13%)

Mutual labels: ocr

ocromore

Process, enhance and evaluate multiple OCR output.

Stars: ✭ 16 (-94.72%)

Mutual labels: ocr

Pdf

Rust library to read, manipulate and write PDF files.

Stars: ✭ 265 (-12.54%)

Mutual labels: pdf

ScreenAccess

Anti Recoil system with weapon type built-in recognition based on OCR, currently support next games: Apex Legends

Stars: ✭ 41 (-86.47%)

Mutual labels: ocr

ocr

Simple app to extract text from pictures using Tesseract

Stars: ✭ 98 (-67.66%)

Mutual labels: ocr

Tucl

The first-ever paper on the Unix shell written by Ken Thompson in 1976 scanned, transcribed, and redistributed with permission

Stars: ✭ 303 (+0%)

Mutual labels: pdf

Invoices

Generate PDF invoices for your customers in laravel

Stars: ✭ 298 (-1.65%)

Mutual labels: pdf

Camelot

Camelot: PDF Table Extraction for Humans

Stars: ✭ 3,150 (+939.6%)

Mutual labels: pdf

Ionic Ocr Example

📷 Simple Ionic app using ocrad.js

Stars: ✭ 263 (-13.2%)

Mutual labels: ocr

pdf2xml-viewer

A simple viewer and inspection tool for text boxes in PDF documents

Stars: ✭ 82 (-72.94%)

Mutual labels: ocr

PRLib

Pre-Recognition Library - library with algorithms for improving OCR quality.

Stars: ✭ 22 (-92.74%)

Mutual labels: ocr

Deck

Slide Decks

Stars: ✭ 261 (-13.86%)

Mutual labels: pdf

tesseract-server

A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizing the power of Google Tesseract.

Stars: ✭ 15 (-95.05%)

Mutual labels: ocr

staff identity card ocr project

Staff Identity Card OCR Project

Stars: ✭ 15 (-95.05%)

Mutual labels: ocr

Rplos

R client for the PLoS Journals API

Stars: ✭ 289 (-4.62%)

Mutual labels: pdf

Tableexport

tableExport（table导出文件，支持json、csv、txt、xml、word、excel、image、pdf）