Alternatives and detailed information of jp-ocr-prunned-cnn

alexfjw / jp-ocr-prunned-cnn

Licence: GPL-3.0 license

Attempting feature map prunning on a CNN trained for Japanese OCR

Programming Languages

python

139335 projects - #7 most used programming language

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to jp-ocr-prunned-cnn

kuzushiji-recognition

Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.

Stars: ✭ 16 (+6.67%)

Mutual labels: ocr, japanese

Kaku

画 - Japanese OCR Dictionary

Stars: ✭ 160 (+966.67%)

Mutual labels: ocr, japanese

gazou

Japanese OCR for Linux & Windows

Stars: ✭ 32 (+113.33%)

Mutual labels: ocr, japanese

YuzuMarker

🍋 [WIP] Manga Translation Tool

Stars: ✭ 76 (+406.67%)

Mutual labels: ocr, japanese

blinkid-ui-android

Customizable UI library that includes camera management, scanning screen, and document selection module.

Stars: ✭ 33 (+120%)

Mutual labels: ocr

Hibi

[No Active Development] An Android app for learning Japanese by keeping a journal.

Stars: ✭ 37 (+146.67%)

Mutual labels: japanese

Document-Scanner-and-OCR

A simple document scanner with OCR implemented using Python and OpenCV

Stars: ✭ 31 (+106.67%)

Mutual labels: ocr

LoL-TFT-Champion-Masking

League Of Legends - Teamfight Tactics Champion Masking

Stars: ✭ 23 (+53.33%)

Mutual labels: ocr

textlint-ja

textlintの日本語コミュニティ/ルールのアイデア

Stars: ✭ 41 (+173.33%)

Mutual labels: japanese

textocry

Textocry - Copy text from Images (chrome extension)

Stars: ✭ 29 (+93.33%)

Mutual labels: ocr

paperless-ng

A supercharged version of paperless: scan, index and archive all your physical documents

Stars: ✭ 4,840 (+32166.67%)

Mutual labels: ocr

PAN-Card-OCR

Retrive meaningful information from PAN Card image using tesseract-ocr 😎

Stars: ✭ 115 (+666.67%)

Mutual labels: ocr

nashi

Some bits of javascript to transcribe scanned pages using PageXML

Stars: ✭ 13 (-13.33%)

Mutual labels: ocr

sakubun

A tool that helps you improve your Japanese vocabulary and kanji skills with practice that's customized to your needs.

Stars: ✭ 20 (+33.33%)

Mutual labels: japanese

Regularization-Pruning

[ICLR'21] PyTorch code for our paper "Neural Pruning via Growing Regularization"

Stars: ✭ 44 (+193.33%)

Mutual labels: pruning

OCR-Test

An experiment about OCR in Android

Stars: ✭ 47 (+213.33%)

Mutual labels: ocr

DocumentLab

OCR using tesseract, ImageMagick, EmguCV, an advanced query language and a fluent query interface for C#

Stars: ✭ 64 (+326.67%)

Mutual labels: ocr

vietnamese-ocr-toolbox

A toolbox for Vietnamese Optical Character Recognition.

Stars: ✭ 26 (+73.33%)

Mutual labels: ocr

insightocr

MXNet OCR implementation. Including text recognition and detection.

Stars: ✭ 100 (+566.67%)

Mutual labels: ocr

kanji

Haskell suite for determining what 級 (level) of the 漢字検定 (national Kanji exam) a given Kanji belongs to.

Stars: ✭ 19 (+26.67%)

Mutual labels: japanese

View All Similar Projects ➔

Project Overview

This is a Udacity capstone project which aims to test the feasibility of CNNs for Japanese OCR, on mobile devices.
Feature map pruning is used to reduce the size of the model & increase inference speed. (follow the approach in [5])
Refer to capestone_report.pdf for a complete description of the project.

Benchmark Summary: Note: size estimate is the raw size estimate for models in Pytorch, inclusive of layer specific details like output count. Sizes are about halved if only the filters in convolutional & FC layers are considered.

Running this project

Python version: 3.6.3

Required packages:

pipy's bitstring
numpy
tqdm
[email protected] with gpu support, (was built from source. 0.3 may work, untested)
torchvision
sklearn

Optional:
(code related to exporting to mobile has been commented out)

onnx
caffe2
onnx-caffe2

Note that the pytorch code uses cuda for training models.
Training with CPU has not been tested, but code should support it.

Setup instructions

Obtain the ETL2 & ETL9G datasets from the following sites. You will need to create an account to get the files.
https://etlcdb.db.aist.go.jp/?page_id=1721
https://etlcdb.db.aist.go.jp/?page_id=1721

Please insert the downloaded files in raw_data directory, the result should look like this
. (raw_data)
├── ETL2
│   ├── ETL2INFO
│   ├── ETL2_1
│   ├── ETL2_2
│   ├── ...
│   └── ETL2_5
├── ETL9G
│   ├── ETL9G_01
│   ├── ETL9G_02
│   ├── ...
│   └── ETL9G_50
├── README.md
└── co59-utf8.txt

Running

To train the models, run the following code at root:

python -m src.main --train --model MODEL_NAME --dataset DATASET_NAME

MODEL_NAMES:
vgg11_bn
chinese_net

DATASET_NAMES:
etl2
etl2_9g

References

This work was made possible by the following papers:

[1] Zhang, X., Bengio, Y., & Liu, C. (2016, June 18). Online and Offline Handwritten Chinese Character Recognition: A Comprehensive Study and New Benchmark. Retrieved December 10, 2017, from https://arxiv.org/abs/1606.05763
[2] Xiao, X., Jin, L., Yang, Y., Yang, W., Sun, J., & Chang, T. (2017, February 26). Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition. Retrieved December 10, 2017, from https://arxiv.org/abs/1702.07975
[3] Wojna, Z., Gorban, A., Lee, D., Murphy, K., Yu, Q., Li, Y., & Ibarz, J. (2017, August 20). Attention-based Extraction of Structured Information from Street View Imagery. Retrieved December 10, 2017, from https://arxiv.org/abs/1704.03549
[4] Tsai, C. Recognizing Handwritten Japanese Characters Using Deep Convolutional Neural Networks Retrieved December 10, 2017, from https://cs231n.stanford.edu/reports/2016/pdfs/262_Report.pdf
[5] Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017, June 08). Pruning Convolutional Neural Networks for Resource Efficient Inference. Retrieved December 10, 2017, from https://arxiv.org/abs/1611.06440

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

alexfjw / jp-ocr-prunned-cnn

Programming Languages

Labels

Projects that are alternatives of or similar to jp-ocr-prunned-cnn

Project Overview

Running this project

Setup instructions

Running

python -m src.main --train --model MODEL_NAME --dataset DATASET_NAME

References