All Projects → yosefmiller → shape-context-ocr

yosefmiller / shape-context-ocr

Licence: MIT license
The Shape Context is a shape descriptor that captures the relative positions of other points on the shape contours, and is used to recognize characters.

Programming Languages

matlab
3953 projects
PHP
23972 projects - #3 most used programming language
C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
javascript
184084 projects - #8 most used programming language
objective c
16641 projects - #2 most used programming language

Projects that are alternatives of or similar to shape-context-ocr

memento
Organize your meme image cluster in a better format using OCR from the meme to sort them using tesseract along with editing memes by segmenting them using OpenCV within a directory
Stars: ✭ 70 (+250%)
Mutual labels:  ocr
scanbot-sdk-example-ios
No description or website provided.
Stars: ✭ 17 (-15%)
Mutual labels:  ocr
ddddocr
带带弟弟 通用验证码识别OCR pypi版
Stars: ✭ 4,093 (+20365%)
Mutual labels:  ocr
saram
Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with auto rotation for wrong orientation. PYPI:
Stars: ✭ 51 (+155%)
Mutual labels:  ocr
video-subtitle-extractor
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Stars: ✭ 1,763 (+8715%)
Mutual labels:  ocr
Snipping-Ocr
A simple Snipping tool for Windows with OCR capabilities
Stars: ✭ 82 (+310%)
Mutual labels:  ocr
baidu-chain-dog
百度莱茨狗爬虫。
Stars: ✭ 52 (+160%)
Mutual labels:  ocr
Handwritten-Text-Recognition
IAM dataset
Stars: ✭ 25 (+25%)
Mutual labels:  ocr
ocr2text
Convert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+350%)
Mutual labels:  ocr
erpnext ocr
🐍 ⚗️ Optical Character Recognition using tesseract within Frappe.
Stars: ✭ 58 (+190%)
Mutual labels:  ocr
Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Stars: ✭ 174 (+770%)
Mutual labels:  ocr
webgrep
Grep Web pages with extra features like JS deobfuscation and OCR
Stars: ✭ 86 (+330%)
Mutual labels:  ocr
crnn.mxnet
crnn in mxnet.can train with chinese characters
Stars: ✭ 47 (+135%)
Mutual labels:  ocr
jochre
Java Optical CHaracter Recognition
Stars: ✭ 18 (-10%)
Mutual labels:  ocr
lego-mindstorms-51515-jetson-nano
Combines the LEGO Mindstorms 51515 with the NVIDIA Jetson Nano
Stars: ✭ 31 (+55%)
Mutual labels:  ocr
SynthText Chinese
Modify from https://github.com/JarveeLee/SynthText_Chinese_version.git with python3 and cv3.
Stars: ✭ 35 (+75%)
Mutual labels:  ocr
form-segmentation
Let's explore how we can extract text from forms
Stars: ✭ 42 (+110%)
Mutual labels:  ocr
receipt-parser-server
Receipt parser server written in python.
Stars: ✭ 64 (+220%)
Mutual labels:  ocr
blinkid-in-browser
BlinkID In-browser SDK for WebAssembly-enabled browsers.
Stars: ✭ 40 (+100%)
Mutual labels:  ocr
CLPR.pytorch
End to End Chinese License Plate Recognition
Stars: ✭ 75 (+275%)
Mutual labels:  ocr

Shape Context - Optical Character Recognition (OCR)

The Shape Context is a shape descriptor that captures the relative positions of other points on the shape contours. This gives a globally discriminative characterization of the shape and not just a localized descriptor. These are then used to measure similarities between shapes and to recognize a character based on each edge’s polar-log distance to the other edges.


How to Use

Add the Matlab OCR folder to Matlab's path. Modify the default settings in Data/OcrDefaults.m. To run the program with a GUI, execute OcrProgram in the Matlab command window. To run the program with default settings, execute ocr in the Matlab command window. To train the program with new data, execute TemplateTraining in the Matlab command window. Remember to specify a name for the training data file by editing the OcrDefaults settings.


Objective

To explore the current approaches in OCR algorithms by performing a literature survey and then writing my own code using a selected technique. I ultimately chose to implement a Shape Context method to identify characters based on their relative shapes.


Introduction

Suppose you have handwritten notes that you would like to digitize. Imagine an algorithm designed to compare each handwritten character pixel-by-pixel to an identical printed character. Such an algorithm would perform poorly since the shapes are mismatched. To the human observer, however, it is fairly easy to read. Why?

The human brain identifies characters based on its relative shape. The mind accounts for distortions, scaling, and other stylistic differences to associate two character sets. For an Optical Character Recognition (OCR) algorithm to accurately read such visual data, it must be able to do the same.*

This is the purpose of the highly discriminative Shape Context descriptor; it describes the coarse distribution of the rest of the shape with respect to the given point on the shape. The relationship is determined by the logarithmic distance and polar angle between two given contours. These values are classified into an appropriate bin within a 12x5 matrix, forming what is called a histogram. An optimal assignment matching algorithm locates correspondences by maximizing Shape Context similarities. Given the set of correspondences, an aligning transformation is estimated that maps one shape onto the other. The dissimilarity between the two shapes is a sum of the matching errors and magnitude of the aligning transform. Lastly, a nearest-neighbor technique is used to select the prototype with the least dissimilarity.


Results

I trained my algorithm with only 400 handwritten samples, compared to the traditional 60,000 found in research papers. Even with the limited dataset, my final algorithm performed reasonably well. My program could be improved by using the complex Thin Plate Spline transformation in place of the Affine transformation to optimally align similar shapes.


Methodology

Step 1: Canny Edge Detection

  1. Apply Gaussian filter to smooth the image in order to remove the noise
  2. Find the intensity gradients of the image
  3. Apply non-maximum suppression to thin the resulting edges
  4. Apply double threshold to classify strong and weak edges
  5. Finalize edges by hysteresis: discard any weak edges which are not connected to a strong edge
  • Shapes are represented by a random set of 50 points along the shape’s contours.

Step 2: Compute Shape Context

  • For each of the 50 points, describe its position in relation to the 49 other points. Points are classified based on its angle and logarithmic distance to the given point. The resulting 12x5 matrix is called a histogram.
  • The resulting list of 50 histograms accurately describes the given shape.

Step 3: Match Edges to Template

  • For each template, create a matrix which finds the cost between all histograms. The cost is computed using the Chi Squared Test Statistic.

Step 4: Match Edges to Template (cont.)

  • Optimally select one-to-one corresponding histograms which minimizes the net cost of matching. This essentially aligns the two shapes. This program uses the efficient Munkres/Hungarian Method to solve this linear assignments problem.

Step 5: Run Affine Transformations

  • Given the set of correspondences, transform the input image to roughly align the shapes. This way, correct matches are closely aligned while others won’t converge.
  • Repeat steps 2-4 with transformed shape.

Step 6: Select Template with lowest cost

  • Get the matching distance for each template by adding a weighted sum of the shape distance (outputted by the Hungarian Method) and the transformation energy (how much transformation was required to align the shapes).
  • Select prototype (character) using a nearest neighbor classifier (k-NN).
  • Sort templates by its matching distance.
  • For the first k (typically 3) templates, select the mode (most common) prototype.

Thin Plate Spline Transformation

The name thin plate spline refers to a physical analogy involving the bending of a thin sheet of metal. Just as the metal has rigidity, the TPS fit resists bending also, implying a penalty involving the smoothness of the fitted surface.

In the physical setting, the deflection is in the z direction, orthogonal to the plane. In order to apply this idea to the problem of coordinate transformation, one interprets the lifting of the plate as a displacement of the x or y coordinates within the plane.


Conclusion

My algorithm is capable of reading printed and handwritten text, assuming that the characters are spatially distinct from one another. It can also be trained with 3D objects using multiple views and to find visually-similar trademarks.

The biggest drawback to this method at the present moment is processing speed, specifically of the optimal assignments problem solver (Hungarian Method) which runs in O(N3) time (where N is the number of edge points). Converting the MATLAB code to C++ may improve the speed, but would still take significant time per character.

Accuracy can be significantly improved if Regularized Thin Plate Spline (TPS) transformations were used. TPS smoothly warps a set of points by selecting a tight match while minimizing the bending energy (a measure of how much transformation is needed to align the points). Affine transformations used in my algorithm are limited as it only allows for tilting of a rigid plane.

The construction of the algorithm gave me insight into the true complexity behind computer vision.


Bibliography

  1. Belongie, S., J. Malik, and J. Puzicha. "Shape Matching and Object Recognition Using Shape Contexts." IEEE Transactions on Pattern Analysis and Machine Intelligence 24.4 (2002): 509-22. Print.
  2. Zhang, Min. "Shape Context Matching Theory." Firefly's Space. 21 Dec. 2012. Web.
  3. "Shape Context." Wikipedia. Wikimedia Foundation. Web.
  4. Wenying, Mo, and Zuchun, Ding. "A Digital Character Recognition Algorithm Based on the Template Weighted Match Degree." Print.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].