All Projects → masyagin1998 → Robin

masyagin1998 / Robin

Licence: mit
RObust document image BINarization

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Robin

Opencv
📷 Computer-Vision Demos
Stars: ✭ 244 (+86.26%)
Mutual labels:  opencv, ocr
Simple Ocr Opencv
A simple python OCR engine using opencv
Stars: ✭ 453 (+245.8%)
Mutual labels:  opencv, ocr
Kraken
OCR engine for all the languages
Stars: ✭ 304 (+132.06%)
Mutual labels:  neural-networks, ocr
Stb Tester
Automated Testing for Set-Top Boxes and Smart TVs
Stars: ✭ 148 (+12.98%)
Mutual labels:  opencv, ocr
Pytesseractid
使用 pytesseract ocr 识别 18 位身份证号
Stars: ✭ 23 (-82.44%)
Mutual labels:  opencv, ocr
Ocrtable
Recognize tables and text from scanned images that contain tables. 从包含表格的扫描图片中识别表格和文字
Stars: ✭ 155 (+18.32%)
Mutual labels:  opencv, ocr
Handwriting Ocr
OCR software for recognition of handwritten text
Stars: ✭ 411 (+213.74%)
Mutual labels:  opencv, ocr
Idcardocr
离线环境下第二代居民身份证信息识别
Stars: ✭ 328 (+150.38%)
Mutual labels:  opencv, ocr
Prlib
Pre-Recognition Library - library with algorithms for improving OCR quality.
Stars: ✭ 18 (-86.26%)
Mutual labels:  opencv, ocr
Quickdraw
Implementation of Quickdraw - an online game developed by Google
Stars: ✭ 805 (+514.5%)
Mutual labels:  opencv, neural-networks
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (+11.45%)
Mutual labels:  opencv, ocr
Gaspumpocr
Python and OpenCV scripts to detect digits on a Gas Pump
Stars: ✭ 116 (-11.45%)
Mutual labels:  opencv, ocr
Spacextract
Extraction and analysis of telemetry from rocket launch webcasts (from SpaceX and RocketLab)
Stars: ✭ 131 (+0%)
Mutual labels:  opencv, ocr
Pine
🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
Stars: ✭ 202 (+54.2%)
Mutual labels:  opencv, neural-networks
Scanner
二维码/条码识别、身份证识别、银行卡识别、车牌识别、图片文字识别、黄图识别、驾驶证(驾照)识别
Stars: ✭ 547 (+317.56%)
Mutual labels:  opencv, ocr
Idmatch
Match faces on id cards with OCR capabilities.
Stars: ✭ 52 (-60.31%)
Mutual labels:  opencv, ocr
Cpp Image Analysis
DataCore bot image analysis component
Stars: ✭ 125 (-4.58%)
Mutual labels:  opencv, ocr
Robovision
AI and machine leaning-based computer vision for a robot
Stars: ✭ 126 (-3.82%)
Mutual labels:  opencv
Raspisecurity
Home Surveillance for Raspberry
Stars: ✭ 128 (-2.29%)
Mutual labels:  opencv
Gon
Gradient Origin Networks - a new type of generative model that is able to quickly learn a latent representation without an encoder
Stars: ✭ 126 (-3.82%)
Mutual labels:  neural-networks

robin

robin is a RObust document image BINarization tool, written in Python.

  • robin - fast document image binarization tool;
  • metrics - script for measuring the quality of binarization;
  • dataset - links for DIBCO 2009-2018, Palm Leaf Manuscript and my own datasets with original and ground-truth images; scripts for creating training data from datasets and downloading imagest from STSL;
  • articles - selected binarization articles, that helped me a lot;
  • weights - pretrained weigths for robin;

Tech

robin uses a number of open source projects to work properly:

  • Keras - high-level neural networks API;
  • Tensorflow - open-source machine-learning framework;
  • OpenCV - a library of programming functions mainly aimed at real-time computer vision;
  • Augmentor - a collection of augmentation algorithms;

Installation

robin requires Python v3.5+ to run.

Get robin, install the dependencies from requirements.txt, download datasets and weights, and now You are ready to binarize documents!

$ git clone https://github.com/masyagin1998/robin.git
$ cd robin
$ pip install -r requirements.txt

HowTo

Robin

robin consists of two main files: src/unet/train.py, which generates weights for U-net model from input 128x128 pairs of original and ground-truth images, and src/unet/binarize.py for binarization group of input document images. Model works with 128x128 images, so binarization tool firstly splits input imags to 128x128 pieces. You can easily rewrite code for different size of U-net image, but researches show that 128 x 128 is the best size.

Metrics

You should know, how good is your binarization tool, so I made a script that automates calculation of four DIBCO metrics: F-measure, pseudo F-measure, PSNR and DRD: src/metrics/metrics.py. Unfortunately it requires two DIBCO tools: weights.exe and metrics.exe, which could be started only on Windows (I tried to run them on Linux with Wine, but couldn't, because one of their dependecies is matlab MCR 9.0 exe).

Dataset

It is realy hard to find good document binarization dataset (DBD), so here I give links to 3 datasets, marked up in a single convenient format. All input image names satisfy [\d]*_in.png regexp, and all ground-truth image names satisfy [\d]*_gt.png regexp.

  • DIBCO - 2009 - 2018 competition datasets;
  • Palm Leaf Manuscript - Palm Leaf Manuscript dataset from ICHFR2016 competition;
  • Borders - Small dataset containing bad text boundaries. It can be used with bigger DIBCO or Palm Lead Manuscript images;
  • Improved LRDE - LRDE 2013 magazines dataset. I improved its ground-truths for better usage;

Also I have some simple script - src/dataset/dataset.py and src/dataset/stsl-download.py. First can fastly generate train-validation-testing data from provided datasets, second can be used for getting interesting training data from the Trinity-Sergius Lavra official site. It is expected, that you train your simple robin on marked dataset, then create new dataset with stsl-download.py and binarize.py, correct generated ground-truths and train robin again with these new pair of input and ground-truth images.

Articles

While I was working on robin, I constantly read some scientific articles. Here I give links to all of them.

  • DIBCO - 2009 - 2018 competition articles;
  • DIBCO metrics - articles about 2 non-standard DIBCO metrics: pseudo F-Measure and DRD (PSNR and F-Measure is realy easy to find on the Web);
  • U-net - articles about U-net convolutional network architecture;
  • CTPN - articles about CTPN - fast neural network for finding text in images (My Neural Network doesn't use it, but it is great and I began my researches from it);
  • ZF_UNET_224 - I think, this is best U-net implementation in the world;

Weigths

Training neural network is not cheap, because you need powerful GPU and CPU, so I provide some pretrained weigths (For training I used two combinations: Nvidia 1050 Ti 4 Gb + Intel Core I7 7700 HQ + 8 Gb RAM and Nvidia 1080 Ti SLI + Intel Xeon E2650 + 128 Gb RAM).

  • Base - weights after training NN on DIBCO and borders data for 256 epochs with batchsize 128 and enabled augmentation. IT IS TRAINED FOR A4 300 DPI Images, so Your input data must have good resolution;

Examples of work

  • Old Orthodox document:
Original image Binarized
in out
  • Checkered sheet:
Original image Binarized
in out
  • Old evidence:
Original image Binarized
in out
  • Magazine with pictures and bright text on dark background:
Original image Binarized
in out

Bugs

  • Keras has some problems with parallel data augmentation: it creates too many processes. I hope it will be fixed soon, but now it is better to use zero value of --extraprocesses flag (default value);

Many thanks to:

  • Igor Vishnyakov and Mikhail Pinchukov - my scientific directors;
  • Chen Jian - DIBCO 2017 article finder;

Referenced or mentioned by:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].