ERPNext OCR
βοΈ Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
Check out more on ERPNext Discuss.
π Changes
See CHANGELOG
π Roadmap
See Taiga.io
π§ Install
Pre-requisites: tesseract-python and imagemagick
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
Install Frappe application
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
-
python binding for tesseract, tesserocr
-
image processing library in python, pillow
-
HTTP library in python, requests
-
python binding for imagemagick, wand
π Usage
File Being Read:
Sample Screenshot:
Tesseract trained data
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
Development
If you wish to develop or just test locally this application, you can use docker-compose up -d
at the root of the this repository.
You can then access your ERPNext OCR dev env at http://localhost:8080
.
Known issues
-
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
-
This can happen due to security configuration in imagemagick preventing it to read PDF files.
-
Reference:
-
-
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
-
This might happen if you're missing a dependency to convert PDF, most of the time
ghostscript
-
References:
-
-
OSError: encoder error -2 when writing image file
- This might happen when trying to open a TIFF image, but the real error is "hidden" and only displayed in console.
- If the original error in console is
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.
β
Run tests
bench run-tests --app erpnext_ocr
π€ Authors
Monogramm
- Website: https://www.monogramm.io
- Github: @Monogramm
John Vincent Fiel
- Github: @jvfiel
π€ Contributing
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
π Show your support
Give a
π License
Copyright Β© 2019 Monogramm.
This project is MIT licensed.
This README was generated with