All Projects β†’ MrZilinXiao β†’ Hyper-Table-OCR

MrZilinXiao / Hyper-Table-OCR

Licence: other
A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to Hyper-Table-OCR

Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Stars: ✭ 174 (+81.25%)
Mutual labels:  ocr, ocr-python
PAN-Card-OCR
Retrive meaningful information from PAN Card image using tesseract-ocr 😎
Stars: ✭ 115 (+19.79%)
Mutual labels:  ocr
Firebase-HMS-ML-Kit-Scanner-Demo
Various scanner use cases using Firebase or HMS ML-Kit
Stars: ✭ 22 (-77.08%)
Mutual labels:  ocr
CRNN
Convolutional recurrent neural network for scene text recognition or OCR in Keras
Stars: ✭ 96 (+0%)
Mutual labels:  ocr
nimtesseract
A Tesseract OCR wrapper for Nim
Stars: ✭ 23 (-76.04%)
Mutual labels:  ocr
Nkocr
πŸ”ŽπŸ“ This is a module to make specifics OCRs at food products and nutritional tables.
Stars: ✭ 15 (-84.37%)
Mutual labels:  ocr
ocrd anybaseocr
DFKI Layout Detection for OCR-D
Stars: ✭ 44 (-54.17%)
Mutual labels:  ocr
DocumentLab
OCR using tesseract, ImageMagick, EmguCV, an advanced query language and a fluent query interface for C#
Stars: ✭ 64 (-33.33%)
Mutual labels:  ocr
Document-Scanner-and-OCR
A simple document scanner with OCR implemented using Python and OpenCV
Stars: ✭ 31 (-67.71%)
Mutual labels:  ocr
CRNN.tf2
Convolutional Recurrent Neural Network(CRNN) for End-to-End Text Recognition - TensorFlow 2
Stars: ✭ 131 (+36.46%)
Mutual labels:  ocr
answer-helper
η™ΎδΈ‡θ‹±ι›„/ε†²ι‘Άε€§δΌšη­”ι’˜εŠ©ζ‰‹
Stars: ✭ 14 (-85.42%)
Mutual labels:  ocr
ocr-fileformat
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
Stars: ✭ 142 (+47.92%)
Mutual labels:  ocr
LoL-TFT-Champion-Masking
League Of Legends - Teamfight Tactics Champion Masking
Stars: ✭ 23 (-76.04%)
Mutual labels:  ocr
Transformer-ocr
Handwritten text recognition using transformers.
Stars: ✭ 92 (-4.17%)
Mutual labels:  ocr
OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Stars: ✭ 6,560 (+6733.33%)
Mutual labels:  ocr
EverTranslator
Translate text anytime and everywhere, even you are gaming!
Stars: ✭ 59 (-38.54%)
Mutual labels:  ocr
YuzuMarker
πŸ‹ [WIP] Manga Translation Tool
Stars: ✭ 76 (-20.83%)
Mutual labels:  ocr
normcap
OCR powered screen-capture tool to capture information instead of images
Stars: ✭ 441 (+359.38%)
Mutual labels:  ocr
paperless-ng
A supercharged version of paperless: scan, index and archive all your physical documents
Stars: ✭ 4,840 (+4941.67%)
Mutual labels:  ocr
insightocr
MXNet OCR implementation. Including text recognition and detection.
Stars: ✭ 100 (+4.17%)
Mutual labels:  ocr

Hyper-Table-OCR

A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.

This pipeline covers image preprocessing, table detection(optional), text OCR, table cell extraction, table reconstruction.

Are you seeking ideas for your own work? Visit my blog post on Hyper-Table-OCR to see more!

Update on 2021-08-20: Happy to see that Baidu has released their PP-Structure, which provides higher robustness due to its DL-driven structure prediction feature, instead of simple matching in our work.

Demo

gif demo

Demo Video (In English): YouTube

Hyper Table Recognition: A carefully-designed Table OCR pipeline

Demo Video (In Chinese): Bilibili

Features

  • Flexible modular architecture: by deriving from predefined abstract class, any module of this pipeline can be easily swapped to your preferred one. See the following Want to contribute? part!
  • A simple yet highly legible web interface.
  • A table reconstruction strategy based simply on coordinates of each cell, including identifying merged cell row & building table structure.
  • More to explore...

Getting Started

Clone this repo

git clone https://github.com/MrZilinXiao/Hyper-Table-Recognition
cd Hyper-Table-Recognition

Download weights

Download from here: GoogleDrive

MD5: (004fabb8f6112d6d43457c681b435631 models.zip)

Unzip it and make sure the directory layout matchs:

# ~/Hyper-Table-Recognition$ tree -L 1
.
β”œβ”€β”€ models
β”œβ”€β”€ app.py
β”œβ”€β”€ config.yml
β”œβ”€β”€ ...

Install Dependencies

This project is developed and tested on:

  • Ubuntu 18.04
  • RTX 3070 with Driver 455.45.01 & CUDA 11.1 & cuDNN 8.0.4
  • Python 3.8.3
  • PyTorch 1.7.0+cu110
  • Tensorflow 2.5.0
  • PaddlePaddle 2.0.0-rc1
  • mmdetection 2.7.0
  • onnxruntime-gpu 1.6.0

An NVIDIA GPU device is compulsory for reasonable inference duration, while GPU with less than 6GB VRAM may experience Out of Memory exception when loading multiple models. You may comment some models in web/__init__.py if experiencing such situation.

No version-specific framework feature is used in this project, so this means you could still enjoy it with lower versions of these frameworks. However, at this time(19th Dec, 2020), users with RTX 3000 Series device may have no access to compiled binary of Tensorflow, onnxruntime-gpu, mmdetection, PaddlePaddle via pip or conda.

Some building tutorials for Ubuntu are as follows:

Confirm all deep learning frameworks installation via:

python -c "import tensorflow as tf; print(tf.__version__); import torch; print(torch.__version__); import paddle; print(paddle.__version__); import onnxruntime as rt; print(rt.__version__); import mmdet; print(mmdet.__version__)"

Then install other necessary libraries via:

pip install -r requirements.txt

Enjoy!

python app.py

Visit http://127.0.0.1:5000 to see the main page!

Performance

Inference time consumption is highly related with following factors:

  • Complexity of table structure
  • Number of OCR blocks
  • Resolution of selected image

A typical inference time consumption is shown in Demo Video.

Want to contribute?

Contribute a new cell extractor

In boardered/extractor.py, we define a TraditionalExtractor based on traditional computer vision techniques and a UNetExtractor based on UNet pixel-level sematic segmentation model. Feel free to derive from the following abstract class:

class CellExtractor(ABC):
    """
    A unified interface for boardered extractor.
    OpenCV & UNet Extractor can derive from this interface.
    """

    def __init__(self):
        pass

    def get_cells(self, ori_img, table_coords) -> List[np.ndarray]:
        """
        :param ori_img: original image
        :param table_coords: List[np.ndarray], xyxy coord of each table
        :return: List[np.ndarray], [[xyxyxyxy(cell1), xyxyxyxy(cell2)](table1), ...]
        """
        pass

Contribute a new OCR Module

Located in ocr/__init__.py, you should build a custom OCR handler deriving from OCRHandler.

class OCRHandler(metaclass=abc.ABCMeta):
    """
    Handler for OCR Support
    An abstract class, any OCR implementations may derive from it
    """

    def __init__(self, *kw, **kwargs):
        pass

    def get_result(self, ori_img):
        """
        Interface for OCR inference
        :param ori_img: np.ndarray
        :return: dict, in following format:
        {'sentences': [['麦格尔特杯葨格OCR桋试葨格2', [[85.0, 10.0], [573.0, 30.0], [572.0, 54.0], [84.0, 33.0]], 0.9],...]}
        """
        pass

Contribute to the process pipeline

WebHandler.pipeline() in web/__init__.py

Future Plans

  • Speed up inference via async-processing on dual GPUs.

Congratulations! This project earns a GRAND PRIZE(2 out of 72 participators) of the aforementioned competition!

Acknowledgement

  • PaddleOCR: Multilingual, awesome, leading, and practical OCR tools supported by Baidu.
  • ChineseOCR_lite: Super light OCR inference tool kit.
  • CascadeTabNet: An automatic table recognition method for interpretation of tabular data in document images.
  • pytorch-hed: An unofficial implementation of Holistically-Nested Edge Detection using PyTorch.
  • table-detect: Excellent work providing us with the U-Net code and pretrained weight.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].