All Projects → beacandler → R2CNN

beacandler / R2CNN

Licence: other
caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to R2CNN

Caffe Oneclick
Use caffe to train your own data in just one click
Stars: ✭ 187 (+133.75%)
Mutual labels:  ocr, caffe
How to write cuda extensions in pytorch
How to write cuda kernels or c functions in pytorch, especially for former caffe users.
Stars: ✭ 51 (-36.25%)
Mutual labels:  caffe
nsfw api
Python REST API to detect images with adult content
Stars: ✭ 71 (-11.25%)
Mutual labels:  caffe
facial-landmarks
Facial landmarks detection with OpenCV, Dlib, DNN
Stars: ✭ 25 (-68.75%)
Mutual labels:  caffe
FileBasedMiniDMS
This php script sorts your documents (by using hardlinks) into subfolders based on the hashtags it finds in your documents filenames.
Stars: ✭ 35 (-56.25%)
Mutual labels:  ocr
IdCardRecognition
Android id card recognition based on OCR. 安卓基于OCR的身份证识别。
Stars: ✭ 35 (-56.25%)
Mutual labels:  ocr
textocry
Textocry - Copy text from Images (chrome extension)
Stars: ✭ 29 (-63.75%)
Mutual labels:  ocr
MLKit
🌝 MLKit是一个强大易用的工具包。通过ML Kit您可以很轻松的实现文字识别、条码识别、图像标记、人脸检测、对象检测等功能。
Stars: ✭ 294 (+267.5%)
Mutual labels:  ocr
ddrl
Deep Developmental Reinforcement Learning
Stars: ✭ 27 (-66.25%)
Mutual labels:  caffe
video-to-text-ocr-demo
视频硬字幕提取
Stars: ✭ 105 (+31.25%)
Mutual labels:  ocr
LaraOCR
Laravel Optical Character Reader(OCR) package using ocr engines like Tesseract
Stars: ✭ 88 (+10%)
Mutual labels:  ocr
jp-ocr-prunned-cnn
Attempting feature map prunning on a CNN trained for Japanese OCR
Stars: ✭ 15 (-81.25%)
Mutual labels:  ocr
TesseractStudio.Net
A free Windows graphical interface to the Tesseract 4.0 OCR engine.
Stars: ✭ 38 (-52.5%)
Mutual labels:  ocr
DLInfBench
CNN model inference benchmarks for some popular deep learning frameworks
Stars: ✭ 51 (-36.25%)
Mutual labels:  caffe
labelReader
Programmatically find and read labels using Machine Learning
Stars: ✭ 44 (-45%)
Mutual labels:  ocr
EmotionChallenge
Source code for 1st winner of face micro-emotion competition, FG 2017.
Stars: ✭ 37 (-53.75%)
Mutual labels:  caffe
caffe
Caffe: a Fast framework for deep learning. Custom version with built-in sparse inputs, segmentation, object detection, class weights, and custom layers
Stars: ✭ 36 (-55%)
Mutual labels:  caffe
tesseract-ocr
Node.js wrapper for Tesseract OCR CLI.
Stars: ✭ 29 (-63.75%)
Mutual labels:  ocr
gazou
Japanese OCR for Linux & Windows
Stars: ✭ 32 (-60%)
Mutual labels:  ocr
google-cloud-vision-php
A simple php wrapper for the google cloud vision API
Stars: ✭ 16 (-80%)
Mutual labels:  ocr

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract

This is a caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.

This project is modified from py-R-FCN, and inclined nms and generate rotated box component is imported from EAST project. Thanks for the author's(@zxytim @argman) help. Please cite this paper if you find this useful.

Contents

  1. Abstract
  2. Structor
  3. Installation
  4. Demo
  5. Test
  6. Train
  7. Experiments
  8. Furthermore

Structor

Code structor

.
├── docker-compose.yml
├── docker // docker deps file
├── Dockerfile // docker build file
├── model // model directory
│   ├── caffemodel // trained caffe model
│   ├── icdar15_gt // ICDAR2015 groundtruth
│   ├── prototxt // caffe prototxt file
│   └── imagenet_models // pretrained on imagenet
├── nvidia-docker-compose.yml
├── logs
│   ├── submit // original submit file
│   ├── submit_zip // zip submit file
│   ├── snapshots
│   └── train
│       ├── VGG16.txt.*
│       └── snapshots
├── README.md
├── requirements.txt // python package
├── src
│   ├── cfgs // train config yml
│   ├── data // cache file
│   ├── lib
│   ├── _init_path.py
│   ├── demo.py
│   ├── eval_icdar15.py // eval 2015 icdar dataset F-meaure
│   ├── test_net.py
│   └── train_net.py
├── demo.sh
├── train.sh
├── images // test images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
└── test.sh // test script

Data structor

It should have this basic structure

ICDARdevkit_Root
.
├── ICDAR2013
├── merge_train.txt  // images list contains ICDAR2013+ICDAR2015 train dataset, then raw data augmentation the same as the paper
├── ICDAR2015
│   ├── augmentation // contains all augmented images
│   └── ImageSets/Main/test.txt // ICDAR2015 test images list

Installation

Install caffe

It is highly recommended to use docker to build environment. More about how to configure docker, see Running with Docker If you are familiar with docker, please run

    1. nvidia-docker-compose run --rm --service-ports rrcnn bash
    2. bash ./demo.sh

If you don't familiar with docker, please follow py-R-FCN to install caffe.

Build

    cd src/lib && make
    

Download Model

  1. please download VGG16 pre-trained model on Imagenet, place it to model/imagenet_models/VGG16.v2.caffemodel.
  2. please download VGG16 trained model by this project, place it model/caffemodel/TextBoxes-v2_iter_12w.caffemodel.

Demo

It is recommended to use UNIX socket to support GUI for docker, plesase open another terminal and type:

    xhost + # may be you need it when open a new terminal
    # docker-compose.yml: mount host  volume : /tmp/.X11-unix to docker volume: /tmp/.X11-unix  
    # pass DISPLAY variable to docker container so host X server can display image in docker
    docker exec -it -e DISPLAY=$DISPLAY ${CURRENT_CONTAINER_ID} bash
    bash ./demo.sh

Test

Single Test

    bash ./test.sh

Multi-scale Test

    # please uncomment two lines in src/cfgs/faster_rcnn_end2end.yml
    SCALES: [720, 1200]
    MULTI_SCALES_NOC: True
    # modify src/lib/datasets/icdar.py to find ICDAR2015 test data, please refer to commit @bbac1cf
    # then run
    bash ./test.sh

Train

Train data

  • Mine: ICDAR2013+ICDAR2015 train dataset, and raw data augmentation, at last got 15977 images.
  • Paper: ICDAR2015 + 2000 focused scene text images they collected.

Train commands

  1. Go to ./src/lib/datasets/icdar.py, modify images path to let train.py find merge_train.txt images list.
  2. Remove cache in src/data/*.pkl or you can load cached roidb data of this project, and place it to src/data/
    # Train for RRCNN4-TextBoxes-v2-OHEM
    bash ./train.sh

note: If you use USE_FLIPPED=True&USE_FLIPPED_QUAD=True, you will get almost 31200 roidb.

Experiments

Mine VS Paper

Approaches Anchor Scales Pooled sizes Inclined NMS Test scales(short side) F-measure(Mine VS paper)
R2CNN-2 (4, 8, 16) (7, 7) Y (720) 71.12% VS 68.49%
R2CNN-3 (4, 8, 16) (7, 7) Y (720) 73.10% VS 74.29%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720) 74.14% VS 74.36%
R2CNN-4 (4, 8, 16, 32) (7, 7) Y (720, 1200) 79.05% VS 81.80%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720) 74.34% VS 75.34%
R2CNN-5 (4, 8, 16, 32) (7, 7) (11, 3) (3, 11) Y (720, 1200) 78.70% VS 82.54%

Appendixes

Approaches Anchor Scales aspect ration Pooled sizes Inclined NMS Test scales(short side) F-measure
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720) 74.36%
R2CNN-4 (4, 8, 16, 32) (0.5, 1, 2) (7, 7) Y (720, 1200) VS 81.80%
R2CNN-4-TextBoxes-OHEM (4, 8, 16, 32) (0.5, 1, 2, 3, 5, 7, 10) (7, 7) Y (720) 76.53%

Furthermore

You can try Resnet-50, Resnet-101 and so on.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].