All Projects → Sid2697 → Word-recognition-EmbedNet-CAB

Sid2697 / Word-recognition-EmbedNet-CAB

Licence: MIT license
Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Word-recognition-EmbedNet-CAB

Text Detector
Tool which allow you to detect and translate text.
Stars: ✭ 173 (+810.53%)
Mutual labels:  recognition, text-recognition, ocr-recognition
Deep Text Recognition Benchmark
Text recognition (optical character recognition) with deep learning methods.
Stars: ✭ 2,665 (+13926.32%)
Mutual labels:  recognition, text-recognition, ocr-recognition
Awesome Deep Text Detection Recognition
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.
Stars: ✭ 2,282 (+11910.53%)
Mutual labels:  text-recognition, ocr-recognition
Meta-SelfLearning
Meta Self-learning for Multi-Source Domain Adaptation: A Benchmark
Stars: ✭ 157 (+726.32%)
Mutual labels:  text-recognition, ocr-recognition
Vedastr
A scene text recognition toolbox based on PyTorch
Stars: ✭ 290 (+1426.32%)
Mutual labels:  text-recognition, ocr-recognition
Tika Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Stars: ✭ 997 (+5147.37%)
Mutual labels:  recognition, text-recognition
compv
Insanely fast Open Source Computer Vision library for ARM and x86 devices (Up to #50 times faster than OpenCV)
Stars: ✭ 155 (+715.79%)
Mutual labels:  text-recognition, ocr-recognition
EverTranslator
Translate text anytime and everywhere, even you are gaming!
Stars: ✭ 59 (+210.53%)
Mutual labels:  text-recognition, ocr-recognition
SiameseCBOW
Implementation of Siamese CBOW using keras whose backend is tensorflow.
Stars: ✭ 14 (-26.32%)
Mutual labels:  word-embeddings
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+42.11%)
Mutual labels:  word-embeddings
SIFRank
The code of our paper "SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model"
Stars: ✭ 96 (+405.26%)
Mutual labels:  word-embeddings
CRNN
Convolutional recurrent neural network for scene text recognition or OCR in Keras
Stars: ✭ 96 (+405.26%)
Mutual labels:  text-recognition
insightocr
MXNet OCR implementation. Including text recognition and detection.
Stars: ✭ 100 (+426.32%)
Mutual labels:  text-recognition
tweet-music-recognizer
🎙️ Node.js Bot to identify songs in Twitter videos
Stars: ✭ 57 (+200%)
Mutual labels:  recognition
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+168.42%)
Mutual labels:  word-embeddings
farm-animal-tracking
Farm Animal Tracking (FAT)
Stars: ✭ 19 (+0%)
Mutual labels:  recognition
nimtesseract
A Tesseract OCR wrapper for Nim
Stars: ✭ 23 (+21.05%)
Mutual labels:  ocr-recognition
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (+0%)
Mutual labels:  word-embeddings
context2vec
PyTorch implementation of context2vec from Melamud et al., CoNLL 2016
Stars: ✭ 18 (-5.26%)
Mutual labels:  word-embeddings
TextDetect
This app detects the text from the picture input using camera or photos gallery. The app uses MLVisionTextModel for on device detection. The Vision framework from MLKit of Google is used here.
Stars: ✭ 14 (-26.32%)
Mutual labels:  text-recognition

Improving Word Recognition using Multiple Hypotheses and Deep Embeddings

arXiv License: MIT

Project page | Paper | Video

This repository contains code for the paper

"Improving Word Recognition using Multiple Hypotheses and Deep Embeddings" Siddhant Bansal, Praveen Krishnan, C.V. Jawahar published in ICPR 2020.

Abstract

We propose a novel scheme for improving the word recognition accuracy using word image embeddings. We use a trained text recognizer, which can predict multiple text hypothesis for a given word image. Our fusion scheme improves the recognition process by utilizing the word image and text embeddings obtained from a trained word image embedding network. We propose EmbedNet, which is trained using a triplet loss for learning a suitable embedding space where the embedding of the word image lies closer to the embedding of the corresponding text transcription. The updated embedding space thus helps in choosing the correct prediction with higher confidence. To further improve the accuracy, we propose a plug-and-play module called Confidence based Accuracy Booster (CAB). The CAB module takes in the confidence scores obtained from the text recognizer and Euclidean distances between the embeddings to generate an updated distance vector. The updated distance vector has lower distance values for the correct words and higher distance values for the incorrect words. We rigorously evaluate our proposed method systematically on a collection of books in the Hindi language. Our method achieves an absolute improvement of around 10% in terms of word recognition accuracy.

Word Recognition Results

Word Recognition

Usage

Cloning the repository

git clone https://github.com/Sid2697/Word-recognition-EmbedNet-CAB.git
cd Word-recognition-EmbedNet-CAB

Install Pre-requisites

  • Python == 3.7
  • PyTorch
  • Scikit-learn
  • NumPy
  • tqdm

requirements.txt has been provided for installing Python dependencies.

pip install -r requirements.txt

Generating deep embeddings

The deep embeddings used in this work are generated using the End2End network proposed in:

Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). pp. 1–6 (April 2018). https://doi.org/10.1109/DAS.2018.70

Word text and image's deep embeddings for testing this repository are provided in the embeddings folder. Text files containing the information about the embeddings are required while running the code. They are in the format

<img1-path><space><text1-string><space><dummyInt><space>1
<img2-path><space><text2-string><space><dummyInt><space>1
...

One can refer to and use https://github.com/kris314/hwnet for generating embeddings.

For the purpose of making it easier to explore the code in this repository, sample text files and embeddings are provided in gen_files and embeddings, respectively.

Original Dataset used in this work will be released by CVIT soon.

Performing word recognition (using a pre-trained EmbedNet)

Pre-trained EmbedNet models are saved in the models folder.

For running baseline word recognition use the command:

python src/word_rec_EmbedNet.py

For running word recognition with confidence score use the command:

python src/word_rec_EmbedNet.py --use_confidence

For running word recognition using a pre-trained EmbedNet use the command:

python src/word_rec_EmbedNet.py --use_confidence --use_model --hidden_layers 1024

For running word recognition using a pre-trained EmbedNet and the CAB module use the command:

python src/word_rec_EmbedNet.py --use_confidence --use_model --hidden_layers 1024 --cab

Other arguments for word recognition experiment are:

--image_embeds
--topk_embeds
--image_file
--predictions_file
--use_confidence
--cab
--cab_alpha
--cab_beta
--in_features
--out_features
--hidden_layers
--model_path
--testing
--test_split
--k
  • image_embeds is used to provide path to the image embeddings
  • topk_embeds is used to provide path to the TopK predictions' embeddings
  • image_file is used to provide path to the image's text information file
  • predictions_file is used to provide path to the TopK predictions' text information file
  • use_confidence if used then confidence score is used for re-ranking the predictions
  • cab if used then the CAB module is used for improving the word recognition accuracy
  • cab_alpha hyper-parameter alpha defined for the CAB module
  • cab_beta hyper-parameter beta defined for the CAB module
  • in_features size of the input to EmbedNet
  • out_features size of the output to EmbedNet
  • hidden_layers list of input size of the hidden layers
  • model_path path to the pre-trained model to be used for testing
  • testing if used then only test set is used for evaluation
  • test_split split for testing the trained EmbedNet on un-seen data
  • k total number of predictions to test on (max 20)

Training EmbedNet

Prepare text files and embeddings as mentioned in Generating deep embeddings. Refer files in folder gen_files for text file's examples. Once the embeddings are prepared run the following command

python src/EmbedNet_train.py --model_name provide_a_name_of_your_choice

For the purpose of a demonstration, you can run the following command

python src/EmbedNet_train.py --model_name temp

This will start training an EmbedNet for 1000 epochs and save the models in trained/EmbedNet_models.

Other arguments for EmbedNet_train.py are:

--base_dir
--model_dir
--train_percentage
--epochs
--lr
--batch
--model_name
--margin
--hidden_layers
--gpu_id
--image_embeds
--topk_embeds
--image_file
--predictions_file
  • base_dir is a path to the directory for saving models
  • model_dir is a name of the folder for saving trained models
  • train_percentage percentage of data to use for training
  • epochs number of epochs to train for
  • lr learning rate
  • batch batch size
  • model_name name of the model for saving
  • margin triplet loss margin
  • hidden_layers list of input size of the hidden layers
  • gpu_id specify which GPU to use
  • image_embeds is used to provide path to the image embeddings
  • topk_embeds is used to provide path to the TopK predictions' embeddings
  • image_file is used to provide path to the image's text information file
  • predictions_file is used to provide path to the TopK predictions' text information file

License and Citation

The software is licensed under the MIT License. If you find this work useful, please cite the following paper:

@misc{bansal2020fused,
      title={Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval}, 
      author={Siddhant Bansal and Praveen Krishnan and C. V. Jawahar},
      year={2020},
      eprint={2007.00166},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

In case of any query contact Siddhant Bansal.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].