All Projects → OCR-D → ocrd_anybaseocr

OCR-D / ocrd_anybaseocr

Licence: Apache-2.0 license
DFKI Layout Detection for OCR-D

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to ocrd anybaseocr

dinglehopper
An OCR evaluation tool
Stars: ✭ 38 (-13.64%)
Mutual labels:  ocr, ocr-d
ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Stars: ✭ 142 (+222.73%)
Mutual labels:  ocr, ocr-d
ocrd cis
OCR-D python tools
Stars: ✭ 28 (-36.36%)
Mutual labels:  ocr, ocr-d
DocTr
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Stars: ✭ 202 (+359.09%)
Mutual labels:  ocr
veryfi-go
Go module for communicating with the Veryfi OCR API
Stars: ✭ 18 (-59.09%)
Mutual labels:  ocr
kuzushiji-recognition
Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.
Stars: ✭ 16 (-63.64%)
Mutual labels:  ocr
idcard-ocr
端到端的针对身份证的文字识别
Stars: ✭ 22 (-50%)
Mutual labels:  ocr
blog
技术资料日常积累(欢迎投稿)
Stars: ✭ 59 (+34.09%)
Mutual labels:  ocr
extract-information-from-identity-card
From identity card image, this repo detect 4 corners, align by OpenCV, then detect word in image and recognize word by Transformer OCR.
Stars: ✭ 81 (+84.09%)
Mutual labels:  ocr
ImageToText
OCR with Google's AI technology (Cloud Vision API)
Stars: ✭ 30 (-31.82%)
Mutual labels:  ocr
Shadow
计算机基础知识,数据结构,设计模式,Tomcat中间件的实现
Stars: ✭ 19 (-56.82%)
Mutual labels:  ocr
ruzzle-solver
A python script that solves ruzzle boards
Stars: ✭ 46 (+4.55%)
Mutual labels:  ocr
fakemenot
Application to check authenticity of Twitter screenshots. Written in Python 🐍
Stars: ✭ 29 (-34.09%)
Mutual labels:  ocr
vrpdr
Deep Learning Applied To Vehicle Registration Plate Detection and Recognition in PyTorch.
Stars: ✭ 36 (-18.18%)
Mutual labels:  ocr
Table-Extractor-From-Image
This repository contains the code that extracts a table from an image and exports it to an Excel.
Stars: ✭ 46 (+4.55%)
Mutual labels:  ocr
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (-9.09%)
Mutual labels:  ocr
Printed-Chinese-Character-OCR
This is a Chinese Character ocr system based on Deep learning (VGG like CNN neural net work),this rep include trainning set generating,image preprocesing,NN model optimizing based on Keras high level NN framwork
Stars: ✭ 21 (-52.27%)
Mutual labels:  ocr
Php-Google-Vision-Api
Google Vision Api for PHP (https://cloud.google.com/vision/)
Stars: ✭ 61 (+38.64%)
Mutual labels:  ocr
mirador-textoverlay
Text Overlay plugin for Mirador 3
Stars: ✭ 35 (-20.45%)
Mutual labels:  ocr
icccig
generate images of itunes card content code. (for readable OCR)
Stars: ✭ 14 (-68.18%)
Mutual labels:  ocr

Document Preprocessing and Segmentation

CircleCI PyPI

Tools to preprocess and segment scanned images for OCR-D

Installing

Requires Python >= 3.6.

  1. Create a new venv unless you already have one

     python3 -m venv venv
    
  2. Activate the venv

     source venv/bin/activate
    
  3. To install from source, get GNU make and do:

     make install
    

    There are also prebuilds available on PyPI:

     pip install ocrd_anybaseocr
    

(This will install both PyTorch and TensorFlow, along with their dependents.)

Tools

All tools, also called processors, abide by the CLI specifications for OCR-D, which roughly looks like:

ocrd-<processor-name> [-m <path to METs input file>] -I <input group> -O <output group> [-p <path to parameter file>]* [-P <param name> <param value>]*

Binarizer

Method Behaviour

For each page (or sub-segment), this processor takes a scanned colored / gray scale document image as input and computes a binarized (black and white) image.

Implemented via rule-based methods (percentile based adaptive background estimation in Ocrolib).

Example

ocrd-anybaseocr-binarize -I OCR-D-IMG -O OCR-D-BIN -P operation_level line -P threshold 0.3

Deskewer

Method Behaviour

For each page (or sub-segment), this processor takes a document image as input and computes the skew angle of that. It also annotates a deskewed image.

The input images have to be binarized for this module to work.

Implemented via rule-based methods (binary projection profile entropy maximization in Ocrolib).

Example

ocrd-anybaseocr-deskew -I OCR-D-BIN -O OCR-D-DESKEW -P maxskew 5.0 -P skewsteps 20 -P operation_level page

Cropper

Method Behaviour

For each page, this processor takes a document image as input and computes the border around the page content area (i.e. removes textual noise as well as any other noise around the page frame). It also annotates a cropped image.

The input image need not be binarized, but should be deskewed for the module to work optimally.

Implemented via rule-based methods (gradient-based line segment detection and morphology based textline detection).

Example:

ocrd-anybaseocr-crop -I OCR-D-DESKEW -O OCR-D-CROP -P rulerAreaMax 0 -P marginLeft 0.1

Dewarper

Method Behaviour

For each page, this processor takes a document image as input and computes a morphed image which will make the text lines straight if they are curved.

The input image has to be binarized for the module to work, and should be cropped and deskewed for optimal quality.

Implemented via data-driven methods (neural GAN conditional image model trained with pix2pixHD/Pytorch).

Models

ocrd resmgr download ocrd-anybaseocr-dewarp '*'

Example

ocrd-anybaseocr-dewarp -I OCR-D-CROP -O OCR-D-DEWARP -P resize_mode none -P gpu_id -1

Text/Non-Text Segmenter

Method Behaviour

For each page, this processor takes a document image as an input and computes two images, separating the text and non-text parts.

The input image has to be binarized for the module to work, and should be cropped and deskewed for optimal quality.

Implemented via data-driven methods (neural pixel classifier model trained with Tensorflow/Keras).

Models

ocrd resmgr download ocrd-anybaseocr-tiseg '*'

Example

ocrd-anybaseocr-tiseg -I OCR-D-DEWARP -O OCR-D-TISEG -P use_deeplr true

Block Segmenter

Method Behaviour

For each page, this processor takes the raw document image as an input and computes a text region segmentation for it (distinguishing various types of text blocks).

The input image need not be binarized, but should be deskewed for the module to work optimally.

Implemented via data-driven methods (neural Mask-RCNN instance segmentation model trained with Tensorflow/Keras).

Models

ocrd resmgr download ocrd-anybaseocr-block-segmentation '*'

Example

ocrd-anybaseocr-block-segmentation -I OCR-D-TISEG -O OCR-D-BLOCK -P active_classes '["page-number", "paragraph", "heading", "drop-capital", "marginalia", "caption"]' -P min_confidence 0.8 -P post_process true

Textline Segmenter

Method Behaviour

For each page (or region), this processor takes a cropped document image as an input and computes a textline segmentation for it.

The input image should be binarized and deskewed for the module to work.

Implemented via rule-based methods (gradient and morphology based line estimation in Ocrolib).

Example

ocrd-anybaseocr-textline -I OCR-D-BLOCK -O OCR-D-LINE -P operation_level region

Document Analyser

Method Behaviour

For the whole document, this processor takes all the cropped page images and their corresponding text regions as input and computes the logical structure (page types and sections).

The input image should be binarized and segmented for this module to work.

Implemented via data-driven methods (neural Inception-V3 image classification model trained with Tensorflow/Keras).

Models

ocrd resmgr download ocrd-anybaseocr-layout-analysis '*'

Example

ocrd-anybaseocr-layout-analysis -I OCR-D-LINE -O OCR-D-STRUCT

Testing

To test the tools under realistic conditions (on OCR-D workspaces), download OCR-D/assets. In particular, the code is tested with the dfki-testdata dataset.

To download the data:

make assets

To run module tests:

make test

To run processor/workflow tests:

make cli-test

License

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].