All Projects → dengdan → Seglink

dengdan / Seglink

Licence: gpl-3.0
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Seglink

pytorch.ctpn
pytorch, ctpn ,text detection ,ocr,文本检测
Stars: ✭ 123 (-74.32%)
Mutual labels:  ocr, text-detection
Dbnet.pytorch
A pytorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
Stars: ✭ 435 (-9.19%)
Mutual labels:  text-detection, ocr
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Stars: ✭ 1,409 (+194.15%)
Mutual labels:  ocr, text-detection
Psenet.pytorch
A pytorch re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network
Stars: ✭ 416 (-13.15%)
Mutual labels:  text-detection, ocr
Megreader
A research project for text detection and recognition using PyTorch 1.2.
Stars: ✭ 332 (-30.69%)
Mutual labels:  text-detection, ocr
Ocr.pytorch
A pure pytorch implemented ocr project including text detection and recognition
Stars: ✭ 196 (-59.08%)
Mutual labels:  text-detection, ocr
craft-text-detector
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector
Stars: ✭ 151 (-68.48%)
Mutual labels:  ocr, text-detection
Craft Pytorch
Official implementation of Character Region Awareness for Text Detection (CRAFT)
Stars: ✭ 2,220 (+363.47%)
Mutual labels:  text-detection, ocr
Chineseaddress ocr
Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。
Stars: ✭ 309 (-35.49%)
Mutual labels:  text-detection, ocr
Text Detection Ctpn
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
Stars: ✭ 3,242 (+576.83%)
Mutual labels:  text-detection, ocr
Awesome Deep Text Detection Recognition
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.
Stars: ✭ 2,282 (+376.41%)
Mutual labels:  text-detection, ocr
React Native Tesseract Ocr
Tesseract OCR wrapper for React Native
Stars: ✭ 384 (-19.83%)
Mutual labels:  text-detection, ocr
Text Detection
Text detection with mainly MSER and SWT
Stars: ✭ 167 (-65.14%)
Mutual labels:  text-detection, ocr
East
A tensorflow implementation of EAST text detector
Stars: ✭ 2,804 (+485.39%)
Mutual labels:  text-detection, ocr
Adelaidet
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
Stars: ✭ 2,565 (+435.49%)
Mutual labels:  text-detection, ocr
vietnamese-ocr-toolbox
A toolbox for Vietnamese Optical Character Recognition.
Stars: ✭ 26 (-94.57%)
Mutual labels:  ocr, text-detection
Tedeval
TedEval: A Fair Evaluation Metric for Scene Text Detectors
Stars: ✭ 143 (-70.15%)
Mutual labels:  text-detection, ocr
East icpr
Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE
Stars: ✭ 154 (-67.85%)
Mutual labels:  text-detection, ocr
PSENet-Tensorflow
TensorFlow implementation of PSENet text detector (Shape Robust Text Detection with Progressive Scale Expansion Networkt)
Stars: ✭ 51 (-89.35%)
Mutual labels:  ocr, text-detection
Awesome Ocr Resources
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).
Stars: ✭ 335 (-30.06%)
Mutual labels:  text-detection, ocr

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link

Contents:

  1. Introduction
  2. Installation&requirements
  3. Datasets
  4. Problems
  5. Models
  6. Test Your own images
  7. Models
  8. Some Comments

Introduction

This is a re-implementation of the SegLink text detection algorithm described in the paper Detecting Oriented Text in Natural Images by Linking Segments, Baoguang Shi, Xiang Bai, Serge Belongie

Installation&requirements

  1. tensorflow-gpu 1.1.0

  2. cv2. I'm using 2.4.9.1, but some other versions less than 3 should be OK too. If not, try to switch to the version as mine.

  3. download the project pylib and add the src folder to your PYTHONPATH

If any other requirements unmet, just install them following the error msg.

Datasets

  1. SynthText

  2. ICDAR2015

Convert them into tfrecords format using the scripts in datasets if you wanna train your own model.

Problems

The convergence speed of my seglink is quite slow compared with that described in the paper. For example, the authors of SegLink paper said that a good result can be obtained by training on Synthtext for less than 10W iterations and on IC15-train for less than 1W iterations. However, using my implementation, I have to train on SynthText for about 20W iterations and another more than 10W iterations on IC15-train, to get a competitive result.

Several reasons may contribute to the slow convergency of my model:

  1. Batch size. I don't have 4 12G-Titans for training, as described in the paper. Instead, I trained my model on two 8G GeForce GTX 1080 or two Titans.
  2. Learning Rate. In the paper, 10^-3 and 10^-4 have been used. But I adopted a fixed learning rate of 10^-4.
  3. Different initialization model. I used the pretrained VGG model from SSD-caffe on coco , because I thought it better than VGG trained on ImageNet. However, it seems that my point of view does not hold. 4.Some other differences exists maybe, I am not sure.

Models

Two models trained on SynthText and IC15 train can be downloaded.

  1. seglink-384. Trained using image size of 384x384, the same image size as the paper. The Hmean is comparable to the result reported in the paper.

The hust_orientedText is the result of paper.

  1. seglink-512. Trainied using image size of 512x512, and one pointer better than 384x384.

They have been trained:

  • on Synthtext for about 20W iterations, and on IC15-train for 10w~20W iterations.

  • learning_rate = 10e-4

  • two gpus

  • 384: GTX 1080, batch_size = 24; 512: Titan, batch_size = 20

Both models perform best at seg_conf_threshold=0.8 and link_conf_threshold=0.5, well, another difference from paper, which takes 0.9 and 0.7 respectively.

Test Your own images

Use the script test_seglink.py, and a shortcut has been created in script test.sh:

Go to the seglink root directory and execute the command:


./scripts/test.sh 0 GPU_ID CKPT_PATH DATASET_DIR

For example:


./scripts/test.sh 0 ~/models/seglink/model.ckpt-217867  ~/dataset/ICDAR2015/Challenge4/ch4_training_images

I have only tested my models on IC15-test, but any other images can be used for test: just put your images into a directory, and config the path in the command as DATASET_DIR.

A bunch of txt files and a zip file is created after test. If you are using IC15-test for testing, you can upload this zip file to the icdar evaluation server directly.

The text files and placed in a subdir of the checkpoint directory, and contain the bounding boxes as the detection results, and can visualized using the script visualize_detection_result.py.

The command looks like:


python visualize_detection_result.py \

    --image=where your images are put

    --det=the directory of the text files output by test_seglink.py

    --output=the output directory of detection results drawn on images.

For example:


python visualize_detection_result.py \

    --image=~/dataset/ICDAR2015/Challenge4/ch4_training_images/ \

    --det=~/models/seglink/seglink_icdar2015_without_ignored/eval/icdar2015_train/model.ckpt-72885/seg_link_conf_th_0.900000_0.700000/txt \
    --output=~/temp/no-use/seglink_result_512_train

Training and evaluation

The training processing requires data processing, i.e. converting data into tfrecords. The converting scripts are put in the datasets directory. The scrips:train_seglink.py and eval_seglink.py are the training and evaluation scripts respectively. Especially, I have implemented an offline evaluation function, which calculates the Recall/Precision/Hmean as the ICDAR test server, and can be used for cross validation and grid search. However, the resulting scores may have slight differences from those of test sever, but it does not matter that much. Sorry for the imcomplete documentation here. Read and modify them if you want to train your own model.

Some Comments

Thanks should be given to the authors of the Seglink paper, i.e., Baoguang Shi1 Xiang Bai1, Serge Belongie.

EAST is another paper on text detection accepted by CVPR 2017, and its reported result is better than that of SegLink. But if they both use same VGG16, their performances are quite similar.

Contact me if you have any problems, through github issues.

Some Notes On Implementation Detail

How the groundtruth is calculated, in Chinese: http://fromwiz.com/share/s/34GeEW1RFx7x2iIM0z1ZXVvc2yLl5t2fTkEg2ZVhJR2n50xg

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].