All Projects → emedvedev → Attention Ocr

emedvedev / Attention Ocr

Licence: mit
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Attention Ocr

Php-Google-Vision-Api
Google Vision Api for PHP (https://cloud.google.com/vision/)
Stars: ✭ 61 (-92.77%)
Mutual labels:  ocr, google-cloud, image-recognition
Image classifier
CNN image classifier implemented in Keras Notebook 🖼️.
Stars: ✭ 139 (-83.53%)
Mutual labels:  cnn, ml, image-recognition
Vedastr
A scene text recognition toolbox based on PyTorch
Stars: ✭ 290 (-65.64%)
Mutual labels:  ocr-recognition, ocr
Ios 10 Sampler
Code examples for new APIs of iOS 10.
Stars: ✭ 3,341 (+295.85%)
Mutual labels:  cnn, image-recognition
Time Series Prediction
A collection of time series prediction methods: rnn, seq2seq, cnn, wavenet, transformer, unet, n-beats, gan, kalman-filter
Stars: ✭ 351 (-58.41%)
Mutual labels:  cnn, seq2seq
OCR-Reader
An Android app to extract text from camera preview directly.
Stars: ✭ 43 (-94.91%)
Mutual labels:  ocr, ocr-recognition
VehicleInfoOCR
Use your camera to read number plates and obtain vehicle details. Simple, ad-free and faster alternative to existing playstore apps
Stars: ✭ 35 (-95.85%)
Mutual labels:  ocr, ocr-recognition
Cnn lstm ctc tensorflow
CNN+LSTM+CTC based OCR implemented using tensorflow.
Stars: ✭ 343 (-59.36%)
Mutual labels:  cnn, ocr
IdCardRecognition
Android id card recognition based on OCR. 安卓基于OCR的身份证识别。
Stars: ✭ 35 (-95.85%)
Mutual labels:  ocr, ocr-recognition
Ocr densenet
第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字
Stars: ✭ 425 (-49.64%)
Mutual labels:  ocr-recognition, ocr
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (-53.55%)
Mutual labels:  cnn, seq2seq
Easyocr
Java OCR 识别组件(基于Tesseract OCR 引擎)。能自动完成图片清理、识别 CAPTCHA 验证码图片内容的一体化工作。Java Image cleanup, OCR recognition component (based Tesseract OCR engine, automatically cleanup image and identification CAPTCHA verification code picture content).
Stars: ✭ 466 (-44.79%)
Mutual labels:  ocr-recognition, ocr
Android-Text-Scanner
Read text and numbers with android camera OCR
Stars: ✭ 27 (-96.8%)
Mutual labels:  ocr, ocr-recognition
python-ocr-example
The code for the blogpost A Python Approach to Character Recognition
Stars: ✭ 54 (-93.6%)
Mutual labels:  ocr, ocr-recognition
ocr
Simple app to extract text from pictures using Tesseract
Stars: ✭ 98 (-88.39%)
Mutual labels:  ocr, image-recognition
lookup
🔍 Pure Go implementation of fast image search and simple OCR, focused on reading info from screenshots
Stars: ✭ 35 (-95.85%)
Mutual labels:  ocr, image-recognition
Basicocr
BasicOCR是一个致力于解决自然场景文字识别算法研究的项目。该项目由长城数字大数据应用技术研究院佟派AI团队发起和维护。
Stars: ✭ 336 (-60.19%)
Mutual labels:  cnn, ocr
Trwebocr
开源易用的中文离线OCR,识别率媲美大厂,并且提供了易用的web页面及web的接口,方便人类日常工作使用或者其他程序来调用~
Stars: ✭ 618 (-26.78%)
Mutual labels:  ocr-recognition, ocr
nimtesseract
A Tesseract OCR wrapper for Nim
Stars: ✭ 23 (-97.27%)
Mutual labels:  ocr, ocr-recognition
LoL-TFT-Champion-Masking
League Of Legends - Teamfight Tactics Champion Masking
Stars: ✭ 23 (-97.27%)
Mutual labels:  ocr, ocr-recognition

Attention-based OCR

Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the trained model with weights as a SavedModel or a frozen graph.

Acknowledgements

This project is based on a model by Qi Guo and Yuntian Deng. You can find the original model in the da03/Attention-OCR repository.

The model

Authors: Qi Guo and Yuntian Deng.

The model first runs a sliding CNN on the image (images are resized to height 32 while preserving aspect ratio). Then an LSTM is stacked on top of the CNN. Finally, an attention model is used as a decoder for producing the final outputs.

OCR example

Installation

pip install aocr

Note: Tensorflow and Numpy will be installed as dependencies. Additional dependencies are PIL/Pillow, distance, and six.

Note #2: this project works with Tensorflow 1.x. Upgrade to Tensorflow 2 is planned, but if you want to help, please feel free to create a PR.

Usage

Create a dataset

To build a TFRecords dataset, you need a collection of images and an annotation file with their respective labels.

aocr dataset ./datasets/annotations-training.txt ./datasets/training.tfrecords
aocr dataset ./datasets/annotations-testing.txt ./datasets/testing.tfrecords

Annotations are simple text files containing the image paths (either absolute or relative to your working dir) and their corresponding labels:

datasets/images/hello.jpg hello
datasets/images/world.jpg world

Train

aocr train ./datasets/training.tfrecords

A new model will be created, and the training will start. Note that it takes quite a long time to reach convergence, since we are training the CNN and attention model simultaneously.

The --steps-per-checkpoint parameter determines how often the model checkpoints will be saved (the default output dir is checkpoints/).

Important: there is a lot of available training options. See the CLI help or the parameters section of this README.

Test and visualize

aocr test ./datasets/testing.tfrecords

Additionally, you can visualize the attention results during testing (saved to out/ by default):

aocr test --visualize ./datasets/testing.tfrecords

Example output images in results/correct:

Image 0 (j/j):

example image 0

Image 1 (u/u):

example image 1

Image 2 (n/n):

example image 2

Image 3 (g/g):

example image 3

Image 4 (l/l):

example image 4

Image 5 (e/e):

example image 5

Export

After the model is trained and a checkpoint is available, it can be exported as either a frozen graph or a SavedModel.

# SavedModel (default):
aocr export ./exported-model

# Frozen graph:
aocr export --format=frozengraph ./exported-model

Load weights from the latest checkpoints and export the model into the ./exported-model directory.

Note: During training, it is possible to pass parameters describing the dimensions of the input images (--max-width, --max-height, etc.). If you used them during training, make sure to also pass them to the export command. Otherwise the exported model will not work properly when serving (next section).

Serving

Exported SavedModel can be served as an HTTP REST API using Tensorflow Serving. You can start the server by running the following command:

tensorflow_model_server --port=9000 --rest_api_port=9001 --model_name=yourmodelname --model_base_path=./exported-model

Note: tensorflow_model_server requires a sub-directory with the version number to be present and inside it the files exported in the previous step. So you need to manually move contents of exported-model into exported-model/1.

Now you can send a prediction request to the running server, for example:

curl -X POST \
  http://localhost:9001/v1/models/aocr:predict \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
  "signature_name": "serving_default",
  "inputs": {
     	"input": { "b64": "<your image encoded as base64>" }
  }
}'

REST API requires binary inputs to be encoded as Base64 and wrapped in an object containing a b64 key. See 'Encoding binary values' in Tensorflow Serving documentation

Google Cloud ML Engine

To train the model in the Google Cloud Machine Learning Engine, upload the training dataset into a Google Cloud Storage bucket and start a training job with the gcloud tool.

  1. Set the environment variables:
# Prefix for the job name.
export JOB_PREFIX="aocr"

# Region to launch the training job in.
# Should be the same as the storage bucket region.
export REGION="us-central1"

# Your storage bucket.
export GS_BUCKET="gs://aocr-bucket"

# Path to store your training dataset in the bucket.
export DATASET_UPLOAD_PATH="training.tfrecords"
  1. Upload the training dataset:
gsutil cp ./datasets/training.tfrecords $GS_BUCKET/$DATASET_UPLOAD_PATH
  1. Launch the ML Engine job:
export NOW=$(date +"%Y%m%d_%H%M%S")
export JOB_NAME="$JOB_PREFIX$NOW"
export JOB_DIR="$GS_BUCKET/$JOB_NAME"

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir=$JOB_DIR \
    --module-name=aocr \
    --package-path=aocr \
    --region=$REGION \
    --scale-tier=BASIC_GPU \
    --runtime-version=1.2 \
    -- \
    train $GS_BUCKET/$DATASET_UPLOAD_PATH \
    --steps-per-checkpoint=500 \
    --batch-size=512 \
    --num-epoch=20

Parameters

Global

  • log-path: Path for the log file.

Testing

  • visualize: Output the attention maps on the original image.

Exporting

  • format: Format for the export (either savedmodel or frozengraph).

Training

  • steps-per-checkpoint: Checkpointing (print perplexity, save model) per how many steps
  • num-epoch: The number of whole data passes.
  • batch-size: Batch size.
  • initial-learning-rate: Initial learning rate, note the we use AdaDelta, so the initial value does not matter much.
  • target-embedding-size: Embedding dimension for each target.
  • attn-num-hidden: Number of hidden units in attention decoder cell.
  • attn-num-layers: Number of layers in attention decoder cell. (Encoder number of hidden units will be attn-num-hidden*attn-num-layers).
  • no-resume: Create new weights even if there are checkpoints present.
  • max-gradient-norm: Clip gradients to this norm.
  • no-gradient-clipping: Do not perform gradient clipping.
  • gpu-id: GPU to use.
  • use-gru: Use GRU cells instead of LSTM.
  • max-width: Maximum width for the input images. WARNING: images with the width higher than maximum will be discarded.
  • max-height: Maximum height for the input images.
  • max-prediction: Maximum length of the predicted word/phrase.

References

Convert a formula to its LaTex source

What You Get Is What You See: A Visual Markup Decompiler

Torch attention OCR

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].