Organize your meme image cluster in a better format using OCR from the meme to sort them using tesseract along with editing memes by segmenting them using OpenCV within a directory

Stars: ✭ 70 (+66.67%)

Mutual labels: ocr

deep-license-plate-recognition

Automatic License Plate Recognition (ALPR) or Automatic Number Plate Recognition (ANPR) software that works with any camera.

Stars: ✭ 309 (+635.71%)

Mutual labels: ocr

MouseTooltipTranslator

chrome extension - When mouse hover on text, it shows translated tooltip using google translate

Stars: ✭ 93 (+121.43%)

Mutual labels: ocr

ocrd cis

OCR-D python tools

Stars: ✭ 28 (-33.33%)

Mutual labels: ocr

ocr2text

Convert a PDF via OCR to a TXT file in UTF-8 encoding

Stars: ✭ 90 (+114.29%)

Mutual labels: ocr

paperbase

Open source document organizer with automatic OCR and full text search

Stars: ✭ 21 (-50%)

Mutual labels: ocr

htr-united

Ground Truth Resources for the HTR of patrimonial documents

Stars: ✭ 23 (-45.24%)

Mutual labels: handwritten-text-recognition

pytorch.ctpn

pytorch, ctpn ,text detection ,ocr,文本检测

Stars: ✭ 123 (+192.86%)

Mutual labels: ocr

jochre

Java Optical CHaracter Recognition

Stars: ✭ 18 (-57.14%)

Mutual labels: ocr

ocr-machine-learning

OCR Machine Learning in python

Stars: ✭ 42 (+0%)

Mutual labels: ocr

webgrep

Grep Web pages with extra features like JS deobfuscation and OCR

Stars: ✭ 86 (+104.76%)

Mutual labels: ocr

ReadToMe

No description or website provided.

Stars: ✭ 51 (+21.43%)

Mutual labels: ocr

baidu-chain-dog

百度莱茨狗爬虫。

Stars: ✭ 52 (+23.81%)

Mutual labels: ocr

Multi-Type-TD-TSR

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Stars: ✭ 174 (+314.29%)

Mutual labels: ocr

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Stars: ✭ 1,409 (+3254.76%)

Mutual labels: ocr

Korean-OCR-Model-Design-based-on-Keras-CNN

Korean OCR Model Design(한글 OCR 모델 설계)

Stars: ✭ 34 (-19.05%)

Mutual labels: ocr

View All Similar Projects ➔

Form Segmentation

Let's explore how we can extract text from any forms / scanned pages.

Objectives

The goal is to find an algorithm that can extract the maximum information from a given page (jpg format). So, we can feed it to another system. (Business logic, neural network, classifier, etc.) The overall process may not be perfect. But it would be great if it can find enough information to identify the type of document and the involve identities.

Parse any form / scanned page and extract any text data (printed text and handwriting text). So, no prior knowledge of the layout / structure of the document.
Automatic extraction process (no human interaction. So, it can scale out)
Somehow fast (or the ability to speed up the task with more machines or CPU)

Challenges

There are many challenges to overcome. But the main problem is to identify which part of the form contains text.

Some other challenges:

Black Border Removal
ICR (Intelligent Character Recognition): recognize and convert hand-drawn characters into text
Scanned page (Detect edges and apply a perspective transform to obtain the top-down view of the document)
Remove noise (blur, OTSU, adaptivethreshold with opencv)
Shape detection and extraction
OCR (Not a real issue since we can use : Tesseract 4 great for printed text)
Handwriting recognition
Minimize errors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

doxakis / form-segmentation

Programming Languages

Labels

Projects that are alternatives of or similar to form-segmentation

Form Segmentation

Objectives

Challenges