All Projects → mittagessen → Kraken

mittagessen / Kraken

Licence: apache-2.0
OCR engine for all the languages

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Kraken

Tensorflow seq2seq chatbot
Stars: ✭ 81 (-73.36%)
Mutual labels:  neural-networks, lstm
Deep Learning With Python
Example projects I completed to understand Deep Learning techniques with Tensorflow. Please note that I do no longer maintain this repository.
Stars: ✭ 134 (-55.92%)
Mutual labels:  neural-networks, lstm
Sarcasmdetection
Sarcasm detection on tweets using neural network
Stars: ✭ 99 (-67.43%)
Mutual labels:  neural-networks, lstm
Handwriting Generation
Implementation of handwriting generation with use of recurrent neural networks in tensorflow. Based on Alex Graves paper (https://arxiv.org/abs/1308.0850).
Stars: ✭ 361 (+18.75%)
Mutual labels:  neural-networks, lstm
Deepjazz
Deep learning driven jazz generation using Keras & Theano!
Stars: ✭ 2,766 (+809.87%)
Mutual labels:  neural-networks, lstm
Easy Deep Learning With Keras
Keras tutorial for beginners (using TF backend)
Stars: ✭ 367 (+20.72%)
Mutual labels:  neural-networks, lstm
Robin
RObust document image BINarization
Stars: ✭ 131 (-56.91%)
Mutual labels:  neural-networks, ocr
Tesseract
This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
Stars: ✭ 43,199 (+14110.2%)
Mutual labels:  lstm, ocr
Ntm One Shot Tf
One Shot Learning using Memory-Augmented Neural Networks (MANN) based on Neural Turing Machine architecture in Tensorflow
Stars: ✭ 238 (-21.71%)
Mutual labels:  neural-networks, lstm
Lstm anomaly thesis
Anomaly detection for temporal data using LSTMs
Stars: ✭ 178 (-41.45%)
Mutual labels:  neural-networks, lstm
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (-27.63%)
Mutual labels:  lstm, ocr
Carrot
🥕 Evolutionary Neural Networks in JavaScript
Stars: ✭ 261 (-14.14%)
Mutual labels:  neural-networks, lstm
Icdar 2019 Sroie
ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction
Stars: ✭ 202 (-33.55%)
Mutual labels:  lstm, ocr
Lstm Context Embeddings
Augmenting word embeddings with their surrounding context using bidirectional RNN
Stars: ✭ 57 (-81.25%)
Mutual labels:  neural-networks, lstm
Easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Stars: ✭ 13,379 (+4300.99%)
Mutual labels:  lstm, ocr
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+3551.32%)
Mutual labels:  neural-networks, lstm
Lstm Ctc Ocr
using rnn (lstm or gru) and ctc to convert line image into text, based on torch7 and warp-ctc
Stars: ✭ 70 (-76.97%)
Mutual labels:  lstm, ocr
Ai Reading Materials
Some of the ML and DL related reading materials, research papers that I've read
Stars: ✭ 79 (-74.01%)
Mutual labels:  lstm, ocr
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+481.25%)
Mutual labels:  neural-networks, lstm
Tess4Android
A new fork base on tess-two and Tesseract 4.0.0
Stars: ✭ 31 (-89.8%)
Mutual labels:  ocr, lstm

Description

.. image:: https://travis-ci.org/mittagessen/kraken.svg?branch=master :target: https://travis-ci.org/mittagessen/kraken

kraken is a turn-key OCR system optimized for historical and non-Latin script material.

kraken's main features are:

  • Fully trainable layout analysis and character recognition
  • Right-to-Left <https://en.wikipedia.org/wiki/Right-to-left>, BiDi <https://en.wikipedia.org/wiki/Bi-directional_text>, and Top-to-Bottom script support
  • ALTO <https://www.loc.gov/standards/alto/>_, PageXML, abbyXML, and hOCR output
  • Word bounding boxes and character cuts
  • Multi-script recognition support
  • Public repository <https://zenodo.org/communities/ocr_models>_ of model files
  • Lightweight model files
  • Variable recognition network architectures

Installation

When using a recent version of pip all dependencies will be installed from binary wheel packages, so installing build-essential or your distributions equivalent is often unnecessary. kraken only runs on Linux or Mac OS X. Windows is not supported.

Install the latest development version through conda <https://anaconda.org>_:

::

$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml $ conda env create -f environment.yml

or:

::

$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment_cuda.yml $ conda env create -f environment_cuda.yml

for CUDA acceleration with the appropriate hardware.

It is also possible to install the latest stable release from pypi:

::

$ pip install kraken

Finally you'll have to scrounge up a model to do the actual recognition of characters. To download the default model for printed English text and place it in the kraken directory for the current user:

::

$ kraken get 10.5281/zenodo.2577813

A list of libre models available in the central repository can be retrieved by running:

::

$ kraken list

Quickstart

Recognizing text on an image using the default parameters including the prerequisite steps of binarization and page segmentation:

::

$ kraken -i image.tif image.txt binarize segment ocr

To binarize a single image using the nlbin algorithm:

::

$ kraken -i image.tif bw.png binarize

To segment an image (binarized or not) with the new baseline segmenter:

::

$ kraken -i image.tif lines.json segment -bl

To segment and OCR an image using the default model(s):

::

$ kraken -i image.tif image.txt segment -bl ocr

All subcommands and options are documented. Use the help option to get more information.

Documentation

Have a look at the docs <http://kraken.re>_

Funding

kraken is developed at the École Pratique des Hautes Études <http://ephe.fr>, Université PSL <http://www.psl.eu>.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].