All Projects → Media-Smart → Vedastr

Media-Smart / Vedastr

Licence: apache-2.0
A scene text recognition toolbox based on PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Vedastr

Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (-60%)
Mutual labels:  ocr, text-recognition, transformer
EverTranslator
Translate text anytime and everywhere, even you are gaming!
Stars: ✭ 59 (-79.66%)
Mutual labels:  ocr, text-recognition, ocr-recognition
Awesome Deep Text Detection Recognition
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.
Stars: ✭ 2,282 (+686.9%)
Mutual labels:  ocr-recognition, ocr, text-recognition
Transformer-ocr
Handwritten text recognition using transformers.
Stars: ✭ 92 (-68.28%)
Mutual labels:  ocr, transformer, ocr-recognition
Deep Text Recognition Benchmark
Text recognition (optical character recognition) with deep learning methods.
Stars: ✭ 2,665 (+818.97%)
Mutual labels:  ocr-recognition, ocr, text-recognition
NLP-image-to-text
code to extract text from images
Stars: ✭ 28 (-90.34%)
Mutual labels:  ocr, text-recognition
nimtesseract
A Tesseract OCR wrapper for Nim
Stars: ✭ 23 (-92.07%)
Mutual labels:  ocr, ocr-recognition
LoL-TFT-Champion-Masking
League Of Legends - Teamfight Tactics Champion Masking
Stars: ✭ 23 (-92.07%)
Mutual labels:  ocr, ocr-recognition
Word-recognition-EmbedNet-CAB
Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"
Stars: ✭ 19 (-93.45%)
Mutual labels:  text-recognition, ocr-recognition
Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Stars: ✭ 174 (-40%)
Mutual labels:  ocr, ocr-recognition
insightocr
MXNet OCR implementation. Including text recognition and detection.
Stars: ✭ 100 (-65.52%)
Mutual labels:  ocr, text-recognition
IdCardRecognition
Android id card recognition based on OCR. 安卓基于OCR的身份证识别。
Stars: ✭ 35 (-87.93%)
Mutual labels:  ocr, ocr-recognition
Meta-SelfLearning
Meta Self-learning for Multi-Source Domain Adaptation: A Benchmark
Stars: ✭ 157 (-45.86%)
Mutual labels:  text-recognition, ocr-recognition
CRNN
Convolutional recurrent neural network for scene text recognition or OCR in Keras
Stars: ✭ 96 (-66.9%)
Mutual labels:  ocr, text-recognition
lego-mindstorms-51515-jetson-nano
Combines the LEGO Mindstorms 51515 with the NVIDIA Jetson Nano
Stars: ✭ 31 (-89.31%)
Mutual labels:  ocr, text-recognition
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+440%)
Mutual labels:  ocr, transformer
python-ocr-example
The code for the blogpost A Python Approach to Character Recognition
Stars: ✭ 54 (-81.38%)
Mutual labels:  ocr, ocr-recognition
Android-Text-Scanner
Read text and numbers with android camera OCR
Stars: ✭ 27 (-90.69%)
Mutual labels:  ocr, ocr-recognition
VehicleInfoOCR
Use your camera to read number plates and obtain vehicle details. Simple, ad-free and faster alternative to existing playstore apps
Stars: ✭ 35 (-87.93%)
Mutual labels:  ocr, ocr-recognition
ID-Card-Passport-Recognition-SDK-Android
On-Device ID Card & Passport & Driver License Recognition SDK for Android
Stars: ✭ 223 (-23.1%)
Mutual labels:  ocr, ocr-recognition

Introduction

vedastr is an open source scene text recognition toolbox based on PyTorch. It is designed to be flexible in order to support rapid implementation and evaluation for scene text recognition task.

Features

  • Modular design
    We decompose the scene text recognition framework into different components and one can easily construct a customized scene text recognition framework by combining different modules.

  • Flexibility
    vedastr is flexible enough to be able to easily change the components within a module.

  • Module expansibility
    It is easy to integrate a new module into the vedastr project.

  • Support of multiple frameworks
    The toolbox supports several popular scene text recognition framework, e.g., CRNN, TPS-ResNet-BiLSTM-Attention, Transformer, etc.

  • Good performance
    We re-implement the best model in deep-text-recognition-benchmark and get better average accuracy. What's more, we implement a simple baseline(ResNet-FC) and the performance is acceptable.

License

This project is released under Apache 2.0 license.

Benchmark and model zoo

Note:

MODEL CASE SENSITIVE IIIT5k_3000 SVT IC03_867 IC13_1015 IC15_2077 SVTP CUTE80 AVERAGE
ResNet-CTC False 87.97 84.54 90.54 88.28 67.99 72.71 77.08 81.58
ResNet-FC False 88.80 88.41 92.85 90.34 72.32 79.38 76.74 84.24
TPS-ResNet-BiLSTM-Attention False 90.93 88.72 93.89 92.12 76.41 80.31 79.51 86.49
Small-SATRN False 91.97 88.10 94.81 93.50 75.64 83.88 80.90 87.19

TPS : Spatial transformer network

Small-SATRN: On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention, training phase is case sensitive while testing phase is case insensitive.

AVERAGE : Average accuracy over all test datasets

CASE SENSITIVE : If true, the output is case sensitive and contain common characters. If false, the output is not case sensetive and contains only numbers and letters.

Installation

Requirements

  • Linux
  • Python 3.6+
  • PyTorch 1.4.0 or higher
  • CUDA 9.0 or higher

We have tested the following versions of OS and softwares:

  • OS: Ubuntu 16.04.6 LTS
  • CUDA: 10.2
  • Python 3.6.9
  • Pytorch: 1.5.1

Install vedastr

  1. Create a conda virtual environment and activate it.
conda create -n vedastr python=3.6 -y
conda activate vedastr
  1. Install PyTorch and torchvision following the official instructions, e.g.,
conda install pytorch torchvision -c pytorch
  1. Clone the vedastr repository.
git clone https://github.com/Media-Smart/vedastr.git
cd vedastr
vedastr_root=${PWD}
  1. Install dependencies.
pip install -r requirements.txt

Prepare data

  1. Download Lmdb data from deep-text-recognition-benchmark, which contains training, validation and evaluation data. Note: we use the ST dataset released by ASTER.

  2. Make directory data as follows:

cd ${vedastr_root}
mkdir ${vedastr_root}/data
  1. Put the download LMDB data into this data directory, the structure of data directory will look like as follows:
data
└── data_lmdb_release
    ├── evaluation
    ├── training
    │   ├── MJ
    │   │   ├── MJ_test
    │   │   ├── MJ_train
    │   │   └── MJ_valid
    │   └── ST
    └── validation

Train

  1. Config

Modify some configuration accordingly in the config file like configs/tps_resnet_bilstm_attn.py

  1. Run
python tools/train.py configs/tps_resnet_bilstm_attn.py 

Snapshots and logs will be generated at vedastr/workdir by default.

Test

  1. Config

Modify some configuration accordingly in the config file like configs/tps_resnet_bilstm_attn.py

  1. Run
python tools/test.py configs/tps_resnet_bilstm_attn.py checkpoint_path

Inference

  1. Run
python tools/inference.py configs/tps_resnet_bilstm_attn.py checkpoint_path img_path

Deploy

  1. Install volksdep following the official instructions

  2. Benchmark (optional)

python tools/deploy/benchmark.py configs/resnet_ctc.py checkpoint_path image_file_path --calibration_images image_folder_path

More available arguments are detailed in tools/deploy/benchmark.py.

The result of resnet_ctc is as follows(test device: Jetson AGX Xavier, CUDA:10.2):

framework version input shape data type throughput(FPS) latency(ms)
pytorch 1.5.0 (1, 1, 32, 100) fp32 64 15.81
tensorrt 7.1.0.16 (1, 1, 32, 100) fp32 109 9.66
pytorch 1.5.0 (1, 1, 32, 100) fp16 113 10.75
tensorrt 7.1.0.16 (1, 1, 32, 100) fp16 308 3.55
tensorrt 7.1.0.16 (1, 1, 32, 100) int8(entropy_2) 449 2.38
  1. Export model as ONNX or TensorRT engine format
python tools/deploy/export.py configs/resnet_ctc.py checkpoint_path image_file_path out_model_path

More available arguments are detailed in tools/deploy/export.py.

  1. Inference SDK

You can refer to FlexInfer for details.

Contact

This repository is currently maintained by Jun Sun(@ChaseMonsterAway), Hongxiang Cai (@hxcai), Yichao Xiong (@mileistone).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].