All Projects → fh2019ustc → DocTr

fh2019ustc / DocTr

Licence: MIT license
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Programming Languages

python
139335 projects - #7 most used programming language
matlab
3953 projects

Projects that are alternatives of or similar to DocTr

Inventory Kamera
Scans Genshin Impact characters, artifacts, and weapons from the game window into a JSON file.
Stars: ✭ 348 (+72.28%)
Mutual labels:  ocr
ZUCC ZhenFangHelper
正方教务管理系统学生版的自动登录、选课、信息获取
Stars: ✭ 36 (-82.18%)
Mutual labels:  ocr
Tess4Android
A new fork base on tess-two and Tesseract 4.0.0
Stars: ✭ 31 (-84.65%)
Mutual labels:  ocr
NLP-image-to-text
code to extract text from images
Stars: ✭ 28 (-86.14%)
Mutual labels:  ocr
nvae
An unofficial toy implementation for NVAE 《A Deep Hierarchical Variational Autoencoder》
Stars: ✭ 83 (-58.91%)
Mutual labels:  pytorch-implementation
depth-map-prediction
Pytorch Implementation of Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
Stars: ✭ 78 (-61.39%)
Mutual labels:  pytorch-implementation
shape-context-ocr
The Shape Context is a shape descriptor that captures the relative positions of other points on the shape contours, and is used to recognize characters.
Stars: ✭ 20 (-90.1%)
Mutual labels:  ocr
Printed-Chinese-Character-OCR
This is a Chinese Character ocr system based on Deep learning (VGG like CNN neural net work),this rep include trainning set generating,image preprocesing,NN model optimizing based on Keras high level NN framwork
Stars: ✭ 21 (-89.6%)
Mutual labels:  ocr
tibetan-ocr
Python OCR for Handwritten Tibetan Mauscripts
Stars: ✭ 19 (-90.59%)
Mutual labels:  ocr
How-to-use-tesseract-ocr-4.0-with-csharp
How to use Tesseract OCR 4.0 with C#
Stars: ✭ 60 (-70.3%)
Mutual labels:  ocr
loc2vec
Pytorch implementation of the Loc2Vec with some modifications for speed
Stars: ✭ 40 (-80.2%)
Mutual labels:  pytorch-implementation
Deep-MVLM
A tool for precisely placing 3D landmarks on 3D facial scans based on the paper "Multi-view Consensus CNN for 3D Facial Landmark Placement"
Stars: ✭ 71 (-64.85%)
Mutual labels:  pytorch-implementation
Magic-VNet
VNet for 3d volume segmentation
Stars: ✭ 45 (-77.72%)
Mutual labels:  pytorch-implementation
i-librarian-free
I, Librarian - open-source version of a PDF managing SaaS.
Stars: ✭ 110 (-45.54%)
Mutual labels:  ocr
blog
技术资料日常积累(欢迎投稿)
Stars: ✭ 59 (-70.79%)
Mutual labels:  ocr
ResNet-50-CBAM-PyTorch
Implementation of Resnet-50 with and without CBAM in PyTorch v1.8. Implementation tested on Intel Image Classification dataset from https://www.kaggle.com/puneet6060/intel-image-classification.
Stars: ✭ 31 (-84.65%)
Mutual labels:  pytorch-implementation
deep-text-recognition-benchmark
Provide the OCR model in ONNX format so that the OpenCV DNN module can use them directly and correctly.
Stars: ✭ 32 (-84.16%)
Mutual labels:  ocr
dinglehopper
An OCR evaluation tool
Stars: ✭ 38 (-81.19%)
Mutual labels:  ocr
PyTorch
An open source deep learning platform that provides a seamless path from research prototyping to production deployment
Stars: ✭ 17 (-91.58%)
Mutual labels:  pytorch-implementation
digdet
A realtime digit OCR on the browser using Machine Learning
Stars: ✭ 22 (-89.11%)
Mutual labels:  ocr

Good news! Our new work exhibits state-of-the-art performances on the DocUNet Benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning

Good news! A comprehensive list of Awesome Document Image Rectification methods is available.

DocTr

1 2 3

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

Training

DocTr consists of two main components: a geometric unwarping transformer (GeoTr) and an illumination correction transformer (IllTr).

  • For geometric unwarping, we train the GeoTr network using the Doc3D dataset.
  • For illumination correction, we train the IllTr network based on the DRIC dataset.

Inference

  1. Download the pretrained models from Google Drive or Baidu Cloud, and put them to $ROOT/model_pretrained/.
  2. Put the distorted images in $ROOT/distorted/.
  3. Geometric unwarping. The rectified images are saved in $ROOT/geo_rec/ by default.
    python inference.py
    
  4. Geometric unwarping and illumination rectification. The rectified images are saved in $ROOT/ill_rec/ by default.
    python inference.py --ill_rec True
    

Evaluation

  • In the DocUNet Benchmark, the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ingored by most of existing works. Before the evaluation, please make a check.
  • We use the same evaluation code for MS-SSIM and LD as DocUNet Benchmark dataset based on Matlab 2019a. Please compare the scores according to your Matlab version. We provide our Matlab interface file at $ROOT/ssim_ld_eval.m.
  • The index of 30 document (60 images) of DocUNet Benchmark used for our OCR evaluation is $ROOT/ocr_img.txt (Setting 1). Please refer to DewarpNet for the index of 25 document (50 images) of DocUNet Benchmark used for their OCR evaluation (Setting 2). We provide the OCR evaluation code at $ROOT/OCR_eval.py. The version of pytesseract is 0.3.8, and the version of Tesseract is recent 5.0.1.20220118.
  • Use the rectified images available from Google Drive or Baidu Cloud for reproducing the quantitative performance on the DocUNet Benchmark reported in the paper and further comparison. We show the performance results of DocTr in the following table. For the performance of other methods, please refer to DocScanner.
Method MS-SSIM LD ED (Setting 1) CER ED (Setting 2) CER
GeoTr 0.5105 7.76 464.83 0.1746 724.84 0.1832

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}
@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

Acknowledgement

The codes are largely based on DocUNet, DewarpNet, and DocProj. Thanks for their wonderful works.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].