All Projects → keensoft → alfresco-simple-ocr

keensoft / alfresco-simple-ocr

Licence: other
Simple OCR action for Alfresco

Programming Languages

java
68154 projects - #9 most used programming language
javascript
184084 projects - #8 most used programming language
Batchfile
5799 projects
shell
77523 projects

Projects that are alternatives of or similar to alfresco-simple-ocr

alfresco-esign-cert
Alfresco client certificate signature (including Chrome)
Stars: ✭ 27 (-32.5%)
Mutual labels:  alfresco, alfresco-addon
awesome-alfresco
A curated list of awesome Alfresco projects and add-ons.
Stars: ✭ 78 (+95%)
Mutual labels:  alfresco, alfresco-addon
deep-text-recognition-benchmark
Provide the OCR model in ONNX format so that the OpenCV DNN module can use them directly and correctly.
Stars: ✭ 32 (-20%)
Mutual labels:  ocr
Shadow
计算机基础知识,数据结构,设计模式,Tomcat中间件的实现
Stars: ✭ 19 (-52.5%)
Mutual labels:  ocr
DocTr
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Stars: ✭ 202 (+405%)
Mutual labels:  ocr
How-to-use-tesseract-ocr-4.0-with-csharp
How to use Tesseract OCR 4.0 with C#
Stars: ✭ 60 (+50%)
Mutual labels:  ocr
ruzzle-solver
A python script that solves ruzzle boards
Stars: ✭ 46 (+15%)
Mutual labels:  ocr
tibetan-ocr
Python OCR for Handwritten Tibetan Mauscripts
Stars: ✭ 19 (-52.5%)
Mutual labels:  ocr
kuzushiji-recognition
Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.
Stars: ✭ 16 (-60%)
Mutual labels:  ocr
dinglehopper
An OCR evaluation tool
Stars: ✭ 38 (-5%)
Mutual labels:  ocr
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (+0%)
Mutual labels:  ocr
Printed-Chinese-Character-OCR
This is a Chinese Character ocr system based on Deep learning (VGG like CNN neural net work),this rep include trainning set generating,image preprocesing,NN model optimizing based on Keras high level NN framwork
Stars: ✭ 21 (-47.5%)
Mutual labels:  ocr
Tess4Android
A new fork base on tess-two and Tesseract 4.0.0
Stars: ✭ 31 (-22.5%)
Mutual labels:  ocr
omynote
众山小笔记 - 集中管理你的读书笔记
Stars: ✭ 154 (+285%)
Mutual labels:  ocr
digdet
A realtime digit OCR on the browser using Machine Learning
Stars: ✭ 22 (-45%)
Mutual labels:  ocr
ImageToText
OCR with Google's AI technology (Cloud Vision API)
Stars: ✭ 30 (-25%)
Mutual labels:  ocr
ZUCC ZhenFangHelper
正方教务管理系统学生版的自动登录、选课、信息获取
Stars: ✭ 36 (-10%)
Mutual labels:  ocr
onlyoffice-alfresco
The package which enables the users to edit office documents from Alfresco using ONLYOFFICE Document Server, allows multiple users to collaborate in real time and to save back those changes to Alfresco
Stars: ✭ 33 (-17.5%)
Mutual labels:  alfresco
veryfi-go
Go module for communicating with the Veryfi OCR API
Stars: ✭ 18 (-55%)
Mutual labels:  ocr
fakemenot
Application to check authenticity of Twitter screenshots. Written in Python 🐍
Stars: ✭ 29 (-27.5%)
Mutual labels:  ocr

Alfresco Simple OCR Action

This addon provides an action to extract OCR text from images or plain PDFs in Alfresco.

License The plugin is licensed under the LGPL v3.0.

State Current addon release is 2.3.1

Compatibility The current version has been developed using Alfresco 5.2 and Alfresco SDK 3.0.2, although it should also run in Alfresco 5.1, 5.0 & 4.2 (as it is developed by using Alfresco SDK 3.0)

Browser compatibility: 100% supported

Supported OCR software:

Languages Currently Share action interface is provided in English and the behaviour internface in English, Spanish, Brazilian Portuguese, German and Italian. OCR supported languages catalog depends directly on selected OCR software (Tesseract OCR or Windows.Media.OCR)

No original Alfresco resources have been overwritten

BeeCon 2016

This addon was presented a BeeCon 2016. You can find additionals details at Integrating a simple OCR in Alfresco

Downloading the ready-to-deploy-plugin

The binary distribution is made of two jar files to be deployed in Alfresco as modules:

You can install them by putting the jar files in module folder:

  • Copy repo JAR to /opt/alfresco/modules/platform (create the directory if it does not exist)
  • Copy share JAR to /opt/alfresco/modules/share

Re-start Alfresco after copying the files.

Building the artifacts

If you are new to Alfresco and the Alfresco Maven SDK, you should start by reading Jeff Potts' tutorial on the subject.

You can build the artifacts from source code using maven $ mvn clean package

Installation

OCR software for Linux depends on programs like gs or ImageMagick, which are also dependencies for Alfresco. In order to avoid problems, it's recommended to install Alfresco from scratch, letting the OS the installation of the packages.

You can find detailed instructions to perform Alfresco installation from scratch at Alfresco Documentation.

If you are using Linux and your Alfresco is installed by using default wizards, you must pay attention to environment execution for programs launched inside your JVM and you must adjust versions and path precedence.

You can find more options to solve this problem at the FAQ page.

Configuration

After installation, following properties must be included in alfresco-global.properties

  • If you are using pdfsandwich
ocr.command=/usr/bin/pdfsandwich
ocr.output.verbose=true
ocr.output.file.prefix.command=-o

ocr.extra.commands=-verbose -lang spa+eng+fra
ocr.server.os=linux

  • If you are using OCRmyPDF
ocr.command=/usr/local/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=

ocr.extra.commands=--verbose 1 --force-ocr -l spa+eng+fra
ocr.server.os=linux

  • If you are using Windows.OCR
ocr.url=http://localhost:60064/api/OCR/
ocr.output.verbose=true

ocr.extra.commands=Spanish
ocr.server.os=windows

Usage of rule

  • Including a rule on a folder by selecting Extract OCR action
  • Every dropped image on this folder will be sent to OCR software in order to produce a searchable PDF file.
  • To perform this operation asynchronously, just use the check provided by Alfresco to configure the rule.
  • To allow Alfresco operating in case of OCR error, set the rule check Continue on error

Usage of action

  • Press the action OCR in document browser or document details
  • The action will be executed in asynchronous mode, so the result will be available after a time

Known issues

  • When using WebDAV to upload documents, only asynchronous rule execution is allowed
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].