All Projects → ahmetozlu → Signature_extractor

ahmetozlu / Signature_extractor

Licence: mit
A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Signature extractor

Easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Stars: ✭ 13,379 (+6426.34%)
Mutual labels:  image-processing, ocr, optical-character-recognition
Pan card ocr project
To extract details from Indian National Identification Cards such as PAN (completed) & Aadhar, Passport, Driving License (WIP) in a structured format
Stars: ✭ 39 (-80.98%)
Mutual labels:  image-processing, ocr, optical-character-recognition
Ssocr
Seven Segment Optical Character Recognition
Stars: ✭ 133 (-35.12%)
Mutual labels:  image-processing, ocr, optical-character-recognition
Receipt Scanner
Receipt scanner extracts information from your PDF or image receipts - built in NodeJS
Stars: ✭ 190 (-7.32%)
Mutual labels:  ocr, optical-character-recognition
Typefont
The first open-source library that detects the font of a text in a image.
Stars: ✭ 1,575 (+668.29%)
Mutual labels:  image-processing, ocr
Open Solution Salt Identification
Open solution to the TGS Salt Identification Challenge
Stars: ✭ 124 (-39.51%)
Mutual labels:  image-segmentation, image-processing
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (-28.78%)
Mutual labels:  image-processing, ocr
Tesseract4android
Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
Stars: ✭ 148 (-27.8%)
Mutual labels:  ocr, optical-character-recognition
Ailearnnotes
Artificial Intelligence Learning Notes.
Stars: ✭ 195 (-4.88%)
Mutual labels:  image-segmentation, image-processing
Piccante
The hottest High Dynamic Range (HDR) Library
Stars: ✭ 195 (-4.88%)
Mutual labels:  image-segmentation, image-processing
Ocr Table
Extract tables from scanned image PDFs using Optical Character Recognition.
Stars: ✭ 165 (-19.51%)
Mutual labels:  ocr, optical-character-recognition
Dataturks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Stars: ✭ 200 (-2.44%)
Mutual labels:  image-segmentation, image-processing
Tesserocr
A Python wrapper for the tesseract-ocr API
Stars: ✭ 1,567 (+664.39%)
Mutual labels:  ocr, optical-character-recognition
Automatic Leaf Infection Identifier
Automatic detection of plant diseases
Stars: ✭ 97 (-52.68%)
Mutual labels:  image-segmentation, image-processing
Penteract Ocr
⭐️ The native node.js bindings to the Tesseract OCR project.
Stars: ✭ 86 (-58.05%)
Mutual labels:  ocr, optical-character-recognition
Keras Icnet
Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images
Stars: ✭ 85 (-58.54%)
Mutual labels:  image-segmentation, image-processing
Android Ocr
Experimental optical character recognition app
Stars: ✭ 2,177 (+961.95%)
Mutual labels:  ocr, optical-character-recognition
Multiclass Semantic Segmentation Camvid
Tensorflow 2 implementation of complete pipeline for multiclass image semantic segmentation using UNet, SegNet and FCN32 architectures on Cambridge-driving Labeled Video Database (CamVid) dataset.
Stars: ✭ 67 (-67.32%)
Mutual labels:  image-segmentation, image-processing
Scanbot Sdk Example Android
Document scanning SDK example apps for the Scanbot SDK for Android.
Stars: ✭ 67 (-67.32%)
Mutual labels:  image-processing, ocr
Pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Stars: ✭ 1,969 (+860.49%)
Mutual labels:  image-processing, ocr

"Signature Extraction" based connected component analysis

A design and implementation of a super lightweight algorithm for "overlapped handwritten signature extraction from scanned documents" using OpenCV and scikit-image on python. Please contact if you need professional signature detection & recognition & segmentation & counting project with the super high accuracy!


  • Input = The scanned document
  • Output = The signatures exist on the input

TODOs:

  • "Outliar Removal" module will be improved to boost the signature extraction algorithm.
  • CNN based "Signature Recognition" module will be developed.
  • "Signature Spoofing Detection" algorithm will be developed.
  • "Signature Detector (bounding box) & Counter" module will be developed.
  • "Accuracy of detection on SigSA: On-line Handwritten Signature Database" will be calculated and shared.

Demo of a Real-life Application of Signature Extraction Algorithm

You can find a sample project that is developed on top of "signature extractor" algorithm to extract the signatures on the digital photo of the document. Here are the functionalities of this sample project:

Sample Test Results of Signature Extraction Algorithm

- Sample result#1:

Explanation: For this case, the signature extraction algorithm can extract the 3 different handwritten signatures successfully. Just a very small portion of the signature, which is located at top-left, is lost because this part is not connected with the whole signature line so the algorithm interprets it is not a part of the signature.

- Sample result#2:

Explanation: For this case, signature extraction algorithm can extract 2 handwriteetn signatures from the whole textual data but it can not remove the lines, that are located at bottom-center, because the signature has big connected pixels so the algorithm sees them as signatures.

- Sample result#3:

Explanation: Some parts of the signatures are lost because they are not connected with the big connected components so the algorithm sees that they are not a part of signatures. They can cathc by setting the threshold value to a bigger value.

Theory

Main pipe-line

Theoritical Background

As already mentioned that the algorithm can extract the signatures from scanned documents based on "connected component analysis" so what is connected component algorithm then?: In image processing, a connected components algorithm finds regions of connected pixels which have the same value!

You can find more detailed information about the connected component analysis in here.

Thus, the connected components can be found and labelled by a cool functionality that is provided by scikit-image library! But why do we need it? Please just check the scanned documents, you can see that the biggest connect components are belongs the handwritten signatures! If we can get the biggest connected components, we can get the signatures from whole documents! However, we can also get the undesired lines or different shapes that have big connected components, right? So we also need a threshold value to get rid of them...

Calculating the threshold value to get rid of the outliars:

I've calculated the threshold value to detect the outliars (any lines, shapes and texts are not a part of the signatures) via performing many experiments. I've got an equation ,are calculated based experiement results, which are works pretty good for most of the scanned documents are a4 sized.

Detect and remove the small size outliars:

Here the code parts that start on signature_extractor.py - line#60:

# experimental-based ratio calculation, modify it for your cases
# a4_small_size_outliar_constant is used as a threshold value to remove connected outliar connected pixels
# are smaller than a4_small_size_outliar_constant for A4 size scanned documents
a4_small_size_outliar_constant = ((average/constant_parameter_1)*constant_parameter_2)+constant_parameter_3
print("a4_small_size_outliar_constant: " + str(a4_small_size_outliar_constant))

I determined the equation (x stands for scanned document size such as A4 or A0):

  • ax_small_size_outliar_constant = ((average/constant_parameter_1) * constant_parameter_2) + constant_parameter_3

based full of my experiments. You can modify it for your cases and also the scanned document size such as for A0 and so on... Just configure the constants:

  • constant_parameter_1
  • constant_parameter_2
  • constant_parameter_3

perform many experiments with the different parameter values till get the highest accuracy!

Detect and remove the big size outliars:

Here the code parts that start on signature_extractor.py - line#66:

# experimental-based ratio calculation, modify it for your cases
# a4_big_size_outliar_constant is used as a threshold value to remove outliar connected pixels
# are bigger than a4_big_size_outliar_constant for A4 size scanned documents
a4_big_size_outliar_constant = a4_small_size_outliar_constant*constant_parameter_4
print("a4_big_size_outliar_constant: " + str(a4_big_size_outliar_constant))

I determined the equation (x stands for scanned document size such as A4 or A0):

  • ax_big_size_outliar_constant = ax_small_size_outliar_constant*constant_parameter_4

based full of my experiments. You can modify it for your cases and also the scanned document size such as for A0 and so on... Just configure the constant:

  • constant_parameter_4

perform many experiments with the different parameter values till get the highest accuracy!

Installation

1.) Python and pip

Python is automatically installed on Ubuntu. Take a moment to confirm (by issuing a python -V command) that one of the following Python versions is already installed on your system:

  • Python 3.3+

The pip or pip3 package manager is usually installed on Ubuntu. Take a moment to confirm (by issuing a pip -V or pip3 -V command) that pip or pip3 is installed. We strongly recommend version 8.1 or higher of pip or pip3. If Version 8.1 or later is not installed, issue the following command, which will either install or upgrade to the latest pip version:

$ sudo apt-get install python3-pip python3-dev # for Python 3.n

2.) scikit-image

On all other systems, install it via shell/command prompt:

pip install scikit-image

If you are running Anaconda or miniconda, use:

conda install -c conda-forge scikit-image

See details in here.


  • After completing these 2 installation steps that are given at above, you can test the project by this command:

    python3 signature_extractor.py
    

Citation

If you use this code for your publications, please cite it as:

@ONLINE{hse,
    author = "Ahmet Özlü",
    title  = "Overlapped handwritten signature extraction from scanned documents",
    year   = "2018",
    url    = "https://github.com/ahmetozlu/signature_extractor"
}

Author

Ahmet Özlü

License

This system is available under the MIT license. See the LICENSE file for more info.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].