All Projects → ThomasLech → CROHME_extractor

ThomasLech / CROHME_extractor

Licence: other
CROHME dataset extractor for OFFLINE-text-recognition task.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to CROHME extractor

AI booklet CE-AUT
Booklet and exam of Artificial Intelligence Master Degree at Amirkabir University of technology.
Stars: ✭ 14 (-81.82%)
Mutual labels:  pattern-recognition
icc
JavaScript module to parse International Color Consortium (ICC) profiles
Stars: ✭ 37 (-51.95%)
Mutual labels:  parse
marc4js
A Node.js API for handling MARC
Stars: ✭ 35 (-54.55%)
Mutual labels:  parse
ParseCareKit
Securely synchronize any CareKit 2.1+ based app to a Parse Server Cloud. Compatible with parse-hipaa.
Stars: ✭ 28 (-63.64%)
Mutual labels:  parse
android-tao-rest-data-processor
Android REST Data Processor library. Easy to build a REST request, to receive and processing data (XML, JSON, CSV and etc.) from REST requests, file system, assets.
Stars: ✭ 24 (-68.83%)
Mutual labels:  parse
krokus
A library to format numbers and a collection for localization patterns.
Stars: ✭ 16 (-79.22%)
Mutual labels:  parse
jgeXml
The Just-Good-Enough XML Toolkit
Stars: ✭ 20 (-74.03%)
Mutual labels:  parse
pyhaproxy
Python library to parse haproxy configurations
Stars: ✭ 50 (-35.06%)
Mutual labels:  parse
xml-to-json
Simple API that converts dynamic XML feeds to JSON through a URL or pasting the raw XML data. Made 100% in PHP.
Stars: ✭ 38 (-50.65%)
Mutual labels:  parse
mtgsqlive
MTGJSON build scripts to generate alternative data formats
Stars: ✭ 40 (-48.05%)
Mutual labels:  parse
elm-html-parser
Parse HTML in Elm!
Stars: ✭ 44 (-42.86%)
Mutual labels:  parse
abstract-syntax-tree
A library for working with abstract syntax trees.
Stars: ✭ 77 (+0%)
Mutual labels:  parse
fluent-plugin-http-pull
The input plugin of fluentd to pull log from rest api.
Stars: ✭ 19 (-75.32%)
Mutual labels:  parse
parse
Parse with an Eloquent-like interface for Laravel
Stars: ✭ 15 (-80.52%)
Mutual labels:  parse
parse-commit-message
(!! moved to tunnckoCore/opensource !! try `parse-commit-message@canary`) Parse, stringify or validate a commit messages that follows Conventional Commits Specification
Stars: ✭ 31 (-59.74%)
Mutual labels:  parse
calcipher
Calculates the best possible answer for multiple-choice questions using techniques to maximize accuracy without any other outside resources or knowledge.
Stars: ✭ 15 (-80.52%)
Mutual labels:  pattern-recognition
Astview
Astview is a graphical viewer for abstract syntax trees
Stars: ✭ 20 (-74.03%)
Mutual labels:  parse
astutils
Bare essentials for building abstract syntax trees, and skeleton classes for PLY lexers and parsers.
Stars: ✭ 13 (-83.12%)
Mutual labels:  parse
libdvbtee
dvbtee: a digital television streamer / parser / service information aggregator supporting various interfaces including telnet CLI & http control
Stars: ✭ 65 (-15.58%)
Mutual labels:  parse
HttpUtility
HttpUtility is an open source MIT license project which is helpful in making HTTP requests and returns a decoded object from server. Right now this utility only parses JSON.
Stars: ✭ 28 (-63.64%)
Mutual labels:  parse

Abstract

CROHME datasets originally exhibit features designed for Online-handwritting recognition task.
Apart from drawn traces being encoded, inkml files also contain trace drawing time captured. So we need to extract new feature map, namely matrices of pixel intensities.

The following scripts will get you started with Offline math symbols recognition task.

Setup

All code is compatible with Python 3.5.* version.

  1. Extract CROHME_full_v2.zip (found inside data directory) contents before running any of the above scripts.

  2. Install specified dependencies with pip (Python Package Manager) using the following shell command:

pip install -U -r requirements.txt

Scripts info

  1. extract.py

    • Extracts trace groups from inkml files.
    • Converts extracted trace groups into images. Images are square shaped bitmaps with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
    • Labels those images (according to inkml files).
    • Flattens images to one-dimensional vectors.
    • Converts labels to one-hot format.
    • Dumps training and testing sets separately into outputs folder.

    Command line arguments: -b [BOX_SIZE] -d [DATASET_VERSION] -c [CATEGORY] -t [THICKNESS]

    Example usage: python extract.py -b 50 -d 2011 2012 2013 -c digits lowercase_letters operators -t 5

    Caution: Script doesn't work properly for images bigger than 200x200 (For yet unknown reason).

  2. balance.py script balances the overall distribution of classes.

    Command line arguments: -b [BOX_SIZE] -ub [UPPER_BOUND][Optional]

    Example usage: python balance.py -b 50 -ub 6000

  3. visualize.py script will plot single figure depicting a random batch of extracted data.

    Command line arguments: -b [BOX_SIZE] -n [N_SAMPLES] -c [COLUMNS]

    Example usage: python visualize.py -b 50 -n 40 -c 8

    Sample Plot: crohme_extractor_plot

  4. extract_hog.py script will extract HoG features.
    This script accepts 1 command line argument, namely hog_cell_size.
    hog_cell_size corresponds to pixels_per_cell parameter of skimage.feature.hog function.
    We use skimage.feature.hog to extract HoG features.
    Example of script execution: python extract_hog.py 5 <-- pixels_per_cell=(5, 5)
    This script loads data previously dumped by extract.py and again dumps its outputs(train, test) separately.

  5. extract_phog.py script will extract PHoG features.
    For PHoG features, HoG feature maps using different cell sizes are concatenated into a single feature vector.
    So this script takes arbitrary number of hog_cell_size values(HoG features have to be previously extracted with extract_hog.py)
    Example of script execution: python extract_phog.py 5 10 20 <-- loads HoGs with respectively 5x5, 10x10, 20x20 cell sizes.

  6. histograms folder contains histograms representing distribution of labels based on different label categories. These diagrams help you better understand extracted data.

Distribution of classes

all_labels_distribution Labels were combined from train and test sets.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].