All Projects → nikolamilosevic86 → TabInOut

nikolamilosevic86 / TabInOut

Licence: other
Framework for information extraction from tables

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TabInOut

odinson
Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (+59.46%)
Mutual labels:  text-mining, information-extraction, rule-based
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+227.03%)
Mutual labels:  text-mining, information-extraction
Chemdataextractor
Automatically extract chemical information from scientific documents
Stars: ✭ 152 (+310.81%)
Mutual labels:  text-mining, information-extraction
slotminer
Tool for slot extraction from text
Stars: ✭ 15 (-59.46%)
Mutual labels:  information-extraction, rule-based
palladian
Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (-13.51%)
Mutual labels:  text-mining, information-extraction
crminer
⛔ ARCHIVED ⛔ Fetch 'Scholary' Full Text from 'Crossref'
Stars: ✭ 17 (-54.05%)
Mutual labels:  text-mining, literature
TableDisentangler
Functional and structural analysis of tables in research papers (Table disentangling)
Stars: ✭ 21 (-43.24%)
Mutual labels:  text-mining, information-extraction
deduce
Deduce: de-identification method for Dutch medical text
Stars: ✭ 40 (+8.11%)
Mutual labels:  text-mining, information-extraction
neji
Flexible and powerful platform for biomedical information extraction from text
Stars: ✭ 37 (+0%)
Mutual labels:  text-mining, information-extraction
SFDCRules
Simple yet powerful Rule Engine for Salesforce - SFDCRules
Stars: ✭ 38 (+2.7%)
Mutual labels:  rule-engine
textreadr
Tools to uniformly read in text data including semi-structured transcripts
Stars: ✭ 65 (+75.68%)
Mutual labels:  text-mining
Saaghar
“Saaghar” (ساغر) is a Persian poetry software written by C++ under Qt framework, it uses "ganjoor" database as its database. It has tab feature in both its “Viewer” and its “Search” page that cause it be suitable for research goals.
Stars: ✭ 42 (+13.51%)
Mutual labels:  literature
Rulette
A pragmatic business rule management system
Stars: ✭ 91 (+145.95%)
Mutual labels:  rule-engine
textlearnR
A simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.
Stars: ✭ 16 (-56.76%)
Mutual labels:  text-mining
iww
AI based web-wrapper for web-content-extraction
Stars: ✭ 61 (+64.86%)
Mutual labels:  information-extraction
tf-idf-python
Term frequency–inverse document frequency for Chinese novel/documents implemented in python.
Stars: ✭ 98 (+164.86%)
Mutual labels:  text-mining
RulerZBundle
Symfony Bundle for RulerZ
Stars: ✭ 38 (+2.7%)
Mutual labels:  rule-engine
extractnet
A Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+40.54%)
Mutual labels:  text-mining
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: ✭ 45 (+21.62%)
Mutual labels:  information-extraction
formik-wizard-form
Build multi step forms using Formik with ease.
Stars: ✭ 64 (+72.97%)
Mutual labels:  wizard-steps

TabInOut (Table Information Out) - Framework for information extraction from tables

TabInOut is a framework for information extraction from tables and a GUI tool for generating information extraction rules from the tables in literature. The tool is dependent on TableDisentangler and actually presents the second step in the extraction pipeline. Firstly, tables are processed, disentangled and annotated using Tabledisentangler tool. TabInOut uses database created by TableAnnotator, uses all the functional and structural annotation performed by TableDisentangler in order to extract information from the tables. It also creates additional table in the mySQL database where it stores the extracted information.

The framework consists of:

  • Methodology and recipe for information extraction from tables
  • Language for describing syntactics of the cell content and assigning values to the cell content parts
  • A GUI wizard that makes describing information extraction task description easy

For more information view project's GitHub Wiki.

We are currently working on a paper that will present the methodology of TabInOut, however, it is based on case study and a hybrid approach already presented at BIOSTEC and BelBi conference. You can see and read relevant papers we published bellow.

The project is part of my PhD project funded by EPRSC and AstraZeneca.

The main application (Wizard) is located under Wizard folder. You can run it by starting TkGUIFirstScreen.py file. Alternatively you can start TableInOut wizard by running TableInOutStarter.sh from the main directory.

Relevant publications:

User guide

For more information about how to use and run TabInOut, please check our User Guide

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].