All Projects → georgeretsi → HTR-ctc

georgeretsi / HTR-ctc

Licence: MIT License
Pytorch implementation of HTR on IAM dataset (word or line level + CTC loss)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to HTR-ctc

OCR
Optical character recognition Using Deep Learning
Stars: ✭ 25 (+66.67%)
Mutual labels:  lstm, ctc-loss
CRNN-OCR-lite
Lightweight CRNN for OCR (including handwritten text) with depthwise separable convolutions and spatial transformer module [keras+tf]
Stars: ✭ 130 (+766.67%)
Mutual labels:  handwritten-text-recognition, ctc-loss
question-pair
A siamese LSTM to detect sentence/question pairs.
Stars: ✭ 25 (+66.67%)
Mutual labels:  lstm
Stock-Prediction
LSTM RNN for sentiment-based stock prediction
Stars: ✭ 50 (+233.33%)
Mutual labels:  lstm
lstm-crf-tagging
No description or website provided.
Stars: ✭ 13 (-13.33%)
Mutual labels:  lstm
Machine-Learning
The projects I do in Machine Learning with PyTorch, keras, Tensorflow, scikit learn and Python.
Stars: ✭ 54 (+260%)
Mutual labels:  lstm
Deep-Learning-for-Expression-Recognition-in-Image-Sequences
The project uses state of the art deep learning on collected data for automatic analysis of emotions.
Stars: ✭ 26 (+73.33%)
Mutual labels:  lstm
Manhattan-LSTM
Keras and PyTorch implementations of the MaLSTM model for computing Semantic Similarity.
Stars: ✭ 28 (+86.67%)
Mutual labels:  lstm
battery-rul-estimation
Remaining Useful Life (RUL) estimation of Lithium-ion batteries using deep LSTMs
Stars: ✭ 25 (+66.67%)
Mutual labels:  lstm
rnn2d
CPU and GPU implementations of some 2D RNN layers
Stars: ✭ 26 (+73.33%)
Mutual labels:  lstm
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (+100%)
Mutual labels:  lstm
medical-diagnosis-cnn-rnn-rcnn
分别使用rnn/cnn/rcnn来实现根据患者描述,进行疾病诊断
Stars: ✭ 39 (+160%)
Mutual labels:  lstm
deep-improvisation
Easy-to-use Deep LSTM Neural Network to generate song sounds like containing improvisation.
Stars: ✭ 53 (+253.33%)
Mutual labels:  lstm
Sequence-Models-coursera
Sequence Models by Andrew Ng on Coursera. Programming Assignments and Quiz Solutions.
Stars: ✭ 53 (+253.33%)
Mutual labels:  lstm
dts
A Keras library for multi-step time-series forecasting.
Stars: ✭ 130 (+766.67%)
Mutual labels:  lstm
Persian-Sentiment-Analyzer
Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )
Stars: ✭ 30 (+100%)
Mutual labels:  lstm
MogrifierLSTM
A quick walk-through of the innards of LSTMs and a naive implementation of the Mogrifier LSTM paper in PyTorch
Stars: ✭ 58 (+286.67%)
Mutual labels:  lstm
dhs summit 2019 image captioning
Image captioning using attention models
Stars: ✭ 34 (+126.67%)
Mutual labels:  lstm
Gradient-Samples
Samples for TensorFlow binding for .NET by Lost Tech
Stars: ✭ 53 (+253.33%)
Mutual labels:  lstm
autonomio
Core functionality for the Autonomio augmented intelligence workbench.
Stars: ✭ 27 (+80%)
Mutual labels:  lstm

HTR-ctc

Pytorch implementation of Handwritten Text Recognition using CTC loss on IAM dataset.

Selected Features:

  • Dataset is saved in a '.pt' file after the initial preprocessing for faster loading operations
  • Loader can handle both word and line-level segmentation of words (change loader parameters in train_htr.py).
    E.g. IAMLoader('train', level='line', fixed_size=(128, None)) or IAMLoader('train', level='word', fixed_size=(128, None))
  • Image resize operations are set through the loader and specifically the fixed_sized argument. If the width variable is None, the the resize operation keeps the aspect ratio and resize the image according to the specified height (e.g. 128). This case generates images of different sizes and thus they cannot be collected to a fixed sized batch. To this end, we update the network every K single image operations (e.g. we set batch_size = 1 and iter_size = 16 in in train_code/config.py). If a fixed size is selected (across all dimensions), e.g. IAMLoader('train', level='line', fixed_size=(128, 1024)), a batch size could be set (e.g. batch_size = 16 and iter_size = 1).
  • Model architecture can be modified by changing the the cnn_cfg and rnn_cfg variables in train_code/config.py. Specifically, CNN is consisted of multiple stacks of ResBlocks and the default setting cnn_cfg = [(2, 32), 'M', (4, 64), 'M', (6, 128), 'M', (2, 256)] is interpeted as follows: the first stack consists of 2 resblocks with output channels of 32 dimensions, the second of 4 resblocks with 64 output channels etc. The 'M' denotes a max-pooling operation of kernel size and stride equal to 2. CNN backbone is topped by an RNN head which finally produces the character predictions. The recurrent newtork is a bidirectional LSTM and its basic configuration is given by the variable rnn_cfg. The deafult setting rnn_cfg = (256, 1) corresponds to a single layerd LSTM with 256 hidden size.

Example:
python train_htr.py -lr 1e-3 -gpu 0

Note: Local paths of IAM dataset (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database) are hardcoded in iam_data_loader/iam_config.py

Developed with Pytorch 0.4.1 and warpctc_pytorch lib (https://github.com/SeanNaren/warp-ctc)
A newer version is coming with the build-in CTC loss of Pytorch (>1.0)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].