nikolamilosevic86 / TabInOut

Licence: other

Framework for information extraction from tables

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TabInOut

Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

Stars: ✭ 59 (+59.46%)

Mutual labels: text-mining, information-extraction, rule-based

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+227.03%)

Mutual labels: text-mining, information-extraction

Chemdataextractor

Automatically extract chemical information from scientific documents

Stars: ✭ 152 (+310.81%)

Mutual labels: text-mining, information-extraction

slotminer

Tool for slot extraction from text

Stars: ✭ 15 (-59.46%)

Mutual labels: information-extraction, rule-based

palladian

Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.

Stars: ✭ 32 (-13.51%)

Mutual labels: text-mining, information-extraction

crminer

⛔ ARCHIVED ⛔ Fetch 'Scholary' Full Text from 'Crossref'

Stars: ✭ 17 (-54.05%)

Mutual labels: text-mining, literature

TableDisentangler

Functional and structural analysis of tables in research papers (Table disentangling)

Stars: ✭ 21 (-43.24%)

Mutual labels: text-mining, information-extraction

deduce

Deduce: de-identification method for Dutch medical text

Stars: ✭ 40 (+8.11%)

Mutual labels: text-mining, information-extraction

neji

Flexible and powerful platform for biomedical information extraction from text

Stars: ✭ 37 (+0%)

Mutual labels: text-mining, information-extraction

SFDCRules

Simple yet powerful Rule Engine for Salesforce - SFDCRules

Stars: ✭ 38 (+2.7%)

Mutual labels: rule-engine

textreadr

Tools to uniformly read in text data including semi-structured transcripts

Stars: ✭ 65 (+75.68%)

Mutual labels: text-mining

Saaghar

“Saaghar” (ساغر) is a Persian poetry software written by C++ under Qt framework, it uses "ganjoor" database as its database. It has tab feature in both its “Viewer” and its “Search” page that cause it be suitable for research goals.

Stars: ✭ 42 (+13.51%)

Mutual labels: literature

Rulette

A pragmatic business rule management system

Stars: ✭ 91 (+145.95%)

Mutual labels: rule-engine

textlearnR

A simple collection of well working NLP models (Keras, H2O, StarSpace) tuned and benchmarked on a variety of datasets.

Stars: ✭ 16 (-56.76%)

Mutual labels: text-mining

iww

AI based web-wrapper for web-content-extraction

Stars: ✭ 61 (+64.86%)

Mutual labels: information-extraction

tf-idf-python

Term frequency–inverse document frequency for Chinese novel/documents implemented in python.

Stars: ✭ 98 (+164.86%)

Mutual labels: text-mining

RulerZBundle

Symfony Bundle for RulerZ

Stars: ✭ 38 (+2.7%)

Mutual labels: rule-engine

extractnet

A Dragnet that also extract author, headline, date, keywords from context

Stars: ✭ 52 (+40.54%)

Mutual labels: text-mining

alter-nlu

Natural language understanding library for chatbots with intent recognition and entity extraction.

Stars: ✭ 45 (+21.62%)

Mutual labels: information-extraction

formik-wizard-form

Build multi step forms using Formik with ease.

Stars: ✭ 64 (+72.97%)

Mutual labels: wizard-steps

View All Similar Projects ➔

TabInOut (Table Information Out) - Framework for information extraction from tables

TabInOut is a framework for information extraction from tables and a GUI tool for generating information extraction rules from the tables in literature. The tool is dependent on TableDisentangler and actually presents the second step in the extraction pipeline. Firstly, tables are processed, disentangled and annotated using Tabledisentangler tool. TabInOut uses database created by TableAnnotator, uses all the functional and structural annotation performed by TableDisentangler in order to extract information from the tables. It also creates additional table in the mySQL database where it stores the extracted information.

The framework consists of:

Methodology and recipe for information extraction from tables
Language for describing syntactics of the cell content and assigning values to the cell content parts
A GUI wizard that makes describing information extraction task description easy

For more information view project's GitHub Wiki.

We are currently working on a paper that will present the methodology of TabInOut, however, it is based on case study and a hybrid approach already presented at BIOSTEC and BelBi conference. You can see and read relevant papers we published bellow.

The project is part of my PhD project funded by EPRSC and AstraZeneca.

The main application (Wizard) is located under Wizard folder. You can run it by starting TkGUIFirstScreen.py file. Alternatively you can start TableInOut wizard by running TableInOutStarter.sh from the main directory.

Relevant publications:

Milosevic, N., Gregson, C., Hernandez, R. Nenadic, G. A framework for information extraction from tables in biomedical literature International Journal on Document Analysis and Recognition (2019). https://doi.org/10.1007/s10032-019-00317-0
Milosevic,N; Gregson, C; Hernandez, R; Nenadic, G. (2016, June). Disentangling the Structure of Tables in Scientific Literature. In Natural Language Processing and Information Systems: 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24, 2016, Proceedings (Vol. 9612, p. 162). Springer.
Milosevic, N., Gregson, C., Hernandez, R., & Nenadic, G. (2016). Extracting patient data from tables in clinical literature: Case study on extraction of BMI, weight and number of patients.. In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies ISBN 978-989-758-170-0, pages 223-228. DOI: 10.5220/0005660102230228
Milosevic, N., Gregson, C., Hernandez, R., & Nenadic, G. Hybrid methodology for information extraction from tables in the biomedical literature. In Proceedings of the Belgrade Bioinformatics Conference (BelBi2016)
Milosevic, N. (2016). Marvin: Semantic annotation using multiple knowledge sources. arXiv preprint arXiv:1602.00515.

User guide

For more information about how to use and run TabInOut, please check our User Guide

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

nikolamilosevic86 / TabInOut

Programming Languages

Labels

Projects that are alternatives of or similar to TabInOut

TabInOut (Table Information Out) - Framework for information extraction from tables

Relevant publications:

User guide