All Projects → githubharald → Simplehtr

githubharald / Simplehtr

Licence: mit
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Simplehtr

Lstm Ctc Ocr
using rnn (lstm or gru) and ctc to convert line image into text, based on torch7 and warp-ctc
Stars: ✭ 70 (-93.47%)
Mutual labels:  recurrent-neural-networks, ocr
Ai Reading Materials
Some of the ML and DL related reading materials, research papers that I've read
Stars: ✭ 79 (-92.63%)
Mutual labels:  recurrent-neural-networks, ocr
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (-79.48%)
Mutual labels:  recurrent-neural-networks, ocr
Pottan Ocr
A stupid OCR for malayalam language
Stars: ✭ 39 (-96.36%)
Mutual labels:  ocr
Char Rnn Keras
TensorFlow implementation of multi-layer recurrent neural networks for training and sampling from texts
Stars: ✭ 40 (-96.27%)
Mutual labels:  recurrent-neural-networks
Coremlvisionscanner
CoreML Vision Text Data & Animal Detector iOS App
Stars: ✭ 49 (-95.43%)
Mutual labels:  ocr
Image Captioning
Image Captioning: Implementing the Neural Image Caption Generator with python
Stars: ✭ 52 (-95.15%)
Mutual labels:  recurrent-neural-networks
Ocrbot
An OCR (Optical Character Recognition) bot for Mastodon (and compatible) instances
Stars: ✭ 39 (-96.36%)
Mutual labels:  ocr
Deepseqslam
The Official Deep Learning Framework for Route-based Place Recognition
Stars: ✭ 49 (-95.43%)
Mutual labels:  recurrent-neural-networks
Ml In Tf
Get started with Machine Learning in TensorFlow with a selection of good reads and implemented examples!
Stars: ✭ 45 (-95.8%)
Mutual labels:  recurrent-neural-networks
Mybox
Easy tools of document, image, file, network, location, color, and media.
Stars: ✭ 45 (-95.8%)
Mutual labels:  ocr
Newocr
A custom OCR library in pure Java made as a replacement for MS Paint IDE's OCR
Stars: ✭ 43 (-95.99%)
Mutual labels:  ocr
Swiftytesseractrte
SwiftyTesseract Real-Time Engine
Stars: ✭ 49 (-95.43%)
Mutual labels:  ocr
Rnn Vae
Variational Autoencoder with Recurrent Neural Network based on Google DeepMind's "DRAW: A Recurrent Neural Network For Image Generation"
Stars: ✭ 39 (-96.36%)
Mutual labels:  recurrent-neural-networks
Idmatch
Match faces on id cards with OCR capabilities.
Stars: ✭ 52 (-95.15%)
Mutual labels:  ocr
Pan card ocr project
To extract details from Indian National Identification Cards such as PAN (completed) & Aadhar, Passport, Driving License (WIP) in a structured format
Stars: ✭ 39 (-96.36%)
Mutual labels:  ocr
Tensorflow Cnn Time Series
Feeding images of time series to Conv Nets! (Tensorflow + Keras)
Stars: ✭ 49 (-95.43%)
Mutual labels:  recurrent-neural-networks
Vue.js With Asp.net Core Sample
This provides a sample code using vue.js running on ASP.NET Core
Stars: ✭ 44 (-95.9%)
Mutual labels:  ocr
Sangita
A Natural Language Toolkit for Indian Languages
Stars: ✭ 43 (-95.99%)
Mutual labels:  recurrent-neural-networks
Eyevis
Android based Vocal Vision for Visually Impaired. Object Detection, Voice Assistance, Optical Character Reader, Read Aloud, Face Recognition, Landmark Recognition, Image Labelling etc.
Stars: ✭ 48 (-95.52%)
Mutual labels:  ocr

Handwritten Text Recognition with TensorFlow

  • Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows
  • Update 2020: code is compatible with TF2

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. 3/4 of the words from the validation-set are correctly recognized, and the character error rate is around 10%.

htr

Run demo

Download the model trained on the IAM dataset. Put the contents of the downloaded file model.zip into the model directory of the repository. Afterwards, go to the src directory and run python main.py. The input image and the expected output is shown below.

test

> python main.py
Init with stored values from ../model/snapshot-39
Recognized: "Hello"
Probability: 0.42098119854927063

Command line arguments

  • --train: train the NN on 95% of the dataset samples and validate on the remaining 5%
  • --validate: validate the trained NN
  • --decoder: select from CTC decoders "bestpath", "beamsearch", and "wordbeamsearch". Defaults to "bestpath". For option "wordbeamsearch" see details below
  • --batch_size: batch size
  • --data_dir: directory containing IAM dataset (with subdirectories img and gt)
  • --fast: use LMDB to load images (faster than loading image files from disk)
  • --dump: dumps the output of the NN to CSV file(s) saved in the dump folder. Can be used as input for the CTCDecoder

If neither --train nor --validate is specified, the NN infers the text from the test image (data/test.png).

Integrate word beam search decoding

The word beam search decoder can be used instead of the two decoders shipped with TF. Words are constrained to those contained in a dictionary, but arbitrary non-word character strings (numbers, punctuation marks) can still be recognized. The following illustration shows a sample for which word beam search is able to recognize the correct text, while the other decoders fail.

decoder_comparison

Follow these instructions to integrate word beam search decoding:

  1. Clone repository CTCWordBeamSearch
  2. Compile and install by running pip install . at the root level of the CTCWordBeamSearch repository
  3. Specify the command line option --decoder wordbeamsearch when executing main.py to actually use the decoder

The dictionary is automatically created in training and validation mode by using all words contained in the IAM dataset (i.e. also including words from validation set) and is saved into the file data/corpus.txt. Further, the manually created list of word-characters can be found in the file model/wordCharList.txt. Beam width is set to 50 to conform with the beam width of vanilla beam search decoding.

Train model with IAM dataset

Follow these instructions to get the IAM dataset:

  • Register for free at this website
  • Download words/words.tgz
  • Download ascii/words.txt
  • Create a directory for the dataset on your disk, and create two subdirectories: img and gt
  • Put words.txt into the gt directory
  • Put the content (directories a01, a02, ...) of words.tgz into the img directory

Start the training

  • Delete files from model directory if you want to train from scratch
  • Go to the src directory and execute python main.py --train --data_dir path/to/IAM
  • Training stops after a fixed number of epochs without improvement

Fast image loading

Loading and decoding the png image files from the disk is the bottleneck even when using only a small GPU. The database LMDB is used to speed up image loading:

  • Go to the src directory and run createLMDB.py --data_dir path/to/IAM with the IAM data directory specified
  • A subfolder lmdb is created in the IAM data directory containing the LMDB files
  • When training the model, add the command line option --fast

The dataset should be located on an SSD drive. Using the --fast option and a GTX 1050 Ti training takes around 3h with a batch size of 500.

Information about model

The model is a stripped-down version of the HTR system I implemented for my thesis. What remains is what I think is the bare minimum to recognize text with an acceptable accuracy. It consists of 5 CNN layers, 2 RNN (LSTM) layers and the CTC loss and decoding layer. The illustration below gives an overview of the NN (green: operations, pink: data flowing through NN) and here follows a short description:

  • The input image is a gray-value image and has a size of 128x32
  • 5 CNN layers map the input image to a feature sequence of size 32x256
  • 2 LSTM layers with 256 units propagate information through the sequence and map the sequence to a matrix of size 32x80. Each matrix-element represents a score for one of the 80 characters at one of the 32 time-steps
  • The CTC layer either calculates the loss value given the matrix and the ground-truth text (when training), or it decodes the matrix to the final text with best path decoding or beam search decoding (when inferring)

nn_overview

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].