All Projects → wenwenyu → MASTER-pytorch

wenwenyu / MASTER-pytorch

Licence: MIT license
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to MASTER-pytorch

query-selector
LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION
Stars: ✭ 63 (-76.05%)
Mutual labels:  transformer, self-attention
R-MeN
Transformer-based Memory Networks for Knowledge Graph Embeddings (ACL 2020) (Pytorch and Tensorflow)
Stars: ✭ 74 (-71.86%)
Mutual labels:  transformer, self-attention
CrabNet
Predict materials properties using only the composition information!
Stars: ✭ 57 (-78.33%)
Mutual labels:  transformer, self-attention
seq2seq-pytorch
Sequence to Sequence Models in PyTorch
Stars: ✭ 41 (-84.41%)
Mutual labels:  transformer, self-attention
Walk-Transformer
From Random Walks to Transformer for Learning Node Embeddings (ECML-PKDD 2020) (In Pytorch and Tensorflow)
Stars: ✭ 26 (-90.11%)
Mutual labels:  transformer, self-attention
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+1199.62%)
Mutual labels:  transformer
Insight
Repository for Project Insight: NLP as a Service
Stars: ✭ 246 (-6.46%)
Mutual labels:  transformer
Bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Stars: ✭ 3,443 (+1209.13%)
Mutual labels:  transformer
Gpt2 Newstitle
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。
Stars: ✭ 235 (-10.65%)
Mutual labels:  transformer
SegSwap
(CVPRW 2022) Learning Co-segmentation by Segment Swapping for Retrieval and Discovery
Stars: ✭ 46 (-82.51%)
Mutual labels:  transformer
transformer
Build English-Vietnamese machine translation with ProtonX Transformer. :D
Stars: ✭ 41 (-84.41%)
Mutual labels:  transformer
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (-5.32%)
Mutual labels:  transformer
Transformers-RL
An easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"
Stars: ✭ 107 (-59.32%)
Mutual labels:  transformer
densecap
Dense video captioning in PyTorch
Stars: ✭ 37 (-85.93%)
Mutual labels:  transformer
nested-transformer
Nested Hierarchical Transformer https://arxiv.org/pdf/2105.12723.pdf
Stars: ✭ 174 (-33.84%)
Mutual labels:  transformer
Relational Rnn Pytorch
An implementation of DeepMind's Relational Recurrent Neural Networks in PyTorch.
Stars: ✭ 236 (-10.27%)
Mutual labels:  transformer
BMT
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
Stars: ✭ 192 (-27%)
Mutual labels:  transformer
SSE-PT
Codes and Datasets for paper RecSys'20 "SSE-PT: Sequential Recommendation Via Personalized Transformer" and NurIPS'19 "Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers"
Stars: ✭ 103 (-60.84%)
Mutual labels:  transformer
VT-UNet
[MICCAI2022] This is an official PyTorch implementation for A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation
Stars: ✭ 151 (-42.59%)
Mutual labels:  transformer
awesome-transformer-search
A curated list of awesome resources combining Transformers with Neural Architecture Search
Stars: ✭ 194 (-26.24%)
Mutual labels:  transformer

MASTER-PyTorch

PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This project is different from our original implementation that builds on the privacy codebase FastOCR of the company. You can also find Tensorflow reimplementation at MASTER-TF repository, and the performance is almost identical. (PS. Logo inspired by the Master Oogway in Kung Fu Panda)

News

  • 2021/07: MASTER-mmocr, reimplementation of MASTER by mmocr. @Jiaquan Ye
  • 2021/07: TableMASTER-mmocr, 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing Task B based on MASTER. @Jiaquan Ye
  • 2021/07: Talk can be found at here (Chinese).
  • 2021/05: Savior, which aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for RPA, is now integrated MASTER for captcha recognition. @Tao Luo
  • 2021/04: Slides can be found at here.

Honors based on MASTER

Contents

Introduction

MASTER is a self-attention based scene text recognizer that (1) not only encodes the input-output attention, but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion and (3) owns a better training and evaluation efficiency. Overall architecture shown follows.

Requirements

  • python==3.6
  • torchvision==0.6.1
  • pandas==1.0.5
  • torch==1.5.1
  • numpy==1.16.4
  • tqdm==4.47.0
  • Distance==0.1.3
  • Pillow==7.2.0
pip install -r requirements.txt

Usage

Prepare Datasets

  1. Prepare the correct format of files as provided in data folder.
  2. Modify train_dataset and val_dataset args in config.json file, including txt_file, img_root, img_w, img_h.
  3. Modify keys.txt in utils/keys.txt file if needed according to the vocabulary of your dataset.
  4. Modify STRING_MAX_LEN in utils/label_util.py file if needed according to the text length of your dataset.

Distributed training with config files

Modify the configurations in configs/config.json and dist_train.sh files, then run:

bash dist_train.sh

The application will be launched via launch.py on a 4 GPU node with one process per GPU (recommend).

This is equivalent to

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c configs/config.json -d 0,1,2,3 --local_world_size 4

and is equivalent to specify indices of available GPUs by CUDA_VISIBLE_DEVICES instead of -d args

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c configs/config.json --local_world_size 4

Similarly, it can be launched with a single process that spans all 4 GPUs (if node has 4 available GPUs) using (don't recommend):

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -c configs/config.json --local_world_size 1

Using Multiple Node

You can enable multi-node multi-GPU training by setting nnodes and node_rank args of the commandline line on every node. e.g., 2 nodes 4 gpus run as follows

Node 1, ip: 192.168.0.10, then run on node 1 as follows

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nnodes=2 --node_rank=0 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c configs/config.json --local_world_size 4  

Node 2, ip: 192.168.0.15, then run on node 2 as follows

CUDA_VISIBLE_DEVICES=2,4,6,7 python -m torch.distributed.launch --nnodes=2 --node_rank=1 --nproc_per_node=4 \
--master_addr=192.168.0.10 --master_port=5555 \
train.py -c configs/config.json --local_world_size 4  

Debug mode on one GPU/CPU training with config files

This option of training mode can debug code without distributed way. -dist must set to false to turn off distributed mode. -d specify which one gpu will be used.

python train.py -c configs/config.json -d 1 -dist false

Resuming from checkpoints

You can resume from a previously saved checkpoint by:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 0,1,2,3 --local_world_size 4 --resume path/to/checkpoint

Finetune from checkpoints

You can finetune from a previously saved checkpoint by:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=4 \
--master_addr=127.0.0.1 --master_port=5555 \
train.py -d 0,1,2,3 --local_world_size 4 --resume path/to/checkpoint --finetune true

Testing from checkpoints

You can predict from a previously saved checkpoint by:

python test.py --checkpoint path/to/checkpoint --img_folder path/to/img_folder \
               --width 160 --height 48 \
               --output_folder path/to/output_folder \
               --gpu 0 --batch_size 64

Note: width and height must be the same as the settings used during training.

Evaluation

Evaluate squence accuracy and edit distance accuracy:

python utils/calculate_metrics.py --predict-path predict_result.json --label-path label.txt

Note: label.txt: multi-line, every line containing {"ImageFile":"ImageFileName.jpg", "Label":"TextLabelStr"}

Customization

Checkpoints

You can specify the name of the training session in config.json files:

"name": "MASTER_Default",
"run_id": "example"

The checkpoints will be saved in save_dir/name/run_id_timestamp/checkpoint_epoch_n, with timestamp in mmdd_HHMMSS format.

A copy of config.json file will be saved in the same folder.

Note: checkpoints contain:

{
  'arch': arch,
  'epoch': epoch,
  'model_state_dict': self.model.state_dict(),
  'optimizer': self.optimizer.state_dict(),
  'monitor_best': self.monitor_best,
  'config': self.config
}

Tensorboard Visualization

This project supports Tensorboard visualization by using either torch.utils.tensorboard or TensorboardX.

  1. Install

    If you are using pytorch 1.1 or higher, install tensorboard by 'pip install tensorboard>=1.14.0'.

    Otherwise, you should install tensorboardx. Follow installation guide in TensorboardX.

  2. Run training

    Make sure that tensorboard option in the config file is turned on.

     "tensorboard" : true
    
  3. Open Tensorboard server

    Type tensorboard --logdir saved/log/ at the project root, then server will open at http://localhost:6006

By default, values of loss will be logged. If you need more visualizations, use add_scalar('tag', data), add_image('tag', image), etc in the trainer._train_epoch method. add_something() methods in this project are basically wrappers for those of tensorboardX.SummaryWriter and torch.utils.tensorboard.SummaryWriter modules.

Note: You don't have to specify current steps, since WriterTensorboard class defined at logger/visualization.py will track current steps.

TODO

  • Memory-Cache based Inference

Citations

If you find MASTER useful please cite our paper:

@article{Lu2021MASTER,
  title={{MASTER}: Multi-Aspect Non-local Network for Scene Text Recognition},
  author={Ning Lu and Wenwen Yu and Xianbiao Qi and Yihao Chen and Ping Gong and Rong Xiao and Xiang Bai},
  journal={Pattern Recognition},
  year={2021}
}

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].