All Projects → nikolamilosevic86 → TableDisentangler

nikolamilosevic86 / TableDisentangler

Licence: GPL-3.0 license
Functional and structural analysis of tables in research papers (Table disentangling)

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to TableDisentangler

neji
Flexible and powerful platform for biomedical information extraction from text
Stars: ✭ 37 (+76.19%)
Mutual labels:  text-mining, information-extraction
deduce
Deduce: de-identification method for Dutch medical text
Stars: ✭ 40 (+90.48%)
Mutual labels:  text-mining, information-extraction
Chemdataextractor
Automatically extract chemical information from scientific documents
Stars: ✭ 152 (+623.81%)
Mutual labels:  text-mining, information-extraction
odinson
Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (+180.95%)
Mutual labels:  text-mining, information-extraction
TabInOut
Framework for information extraction from tables
Stars: ✭ 37 (+76.19%)
Mutual labels:  text-mining, information-extraction
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+476.19%)
Mutual labels:  text-mining, information-extraction
palladian
Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (+52.38%)
Mutual labels:  text-mining, information-extraction
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Stars: ✭ 45 (+114.29%)
Mutual labels:  text-mining
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+3285.71%)
Mutual labels:  text-mining
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+185.71%)
Mutual labels:  text-mining
ViewBuilder
Fully document based declarative way for building UI with a custom and more performant layout.
Stars: ✭ 14 (-33.33%)
Mutual labels:  xml-parsing
ReQuest
Indirect Supervision for Relation Extraction Using Question-Answer Pairs (WSDM'18)
Stars: ✭ 26 (+23.81%)
Mutual labels:  information-extraction
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-23.81%)
Mutual labels:  text-mining
text-mined-synthesis public
Codes for text-mined solid-state reactions dataset
Stars: ✭ 46 (+119.05%)
Mutual labels:  text-mining
nmap-formatter
A tool that allows you to convert NMAP results to html, csv, json, markdown, graphviz (dot). Simply put it's nmap converter.
Stars: ✭ 129 (+514.29%)
Mutual labels:  xml-parsing
agreementmaker
AgreementMaker Ontology Matching System
Stars: ✭ 33 (+57.14%)
Mutual labels:  schema-matching
naacl2018-fever
Fact Extraction and VERification baseline published in NAACL2018
Stars: ✭ 109 (+419.05%)
Mutual labels:  information-extraction
woolly
The Text Mining Elixir
Stars: ✭ 48 (+128.57%)
Mutual labels:  text-mining
BioMedical-NLP-corpus
Biomedical NLP Corpus or Datasets.
Stars: ✭ 44 (+109.52%)
Mutual labels:  text-mining
valentine
A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema marching methods.
Stars: ✭ 43 (+104.76%)
Mutual labels:  schema-matching

TableDisentangler - A tool for automatic disentangling of functional areas in tables and their annotation

TableDisentangler is a tool for annotating tables written in Java. It uses specific annotation schema we proposed that is able to capture information about functions of a cell and inter-cell relationships. TableDisentangler is a tool for extracting annotations from tables in PMC clinical documents in XML format (it is possible to generate XML from PDF).

Tool does this in a couple of steps. Firstly, tables are decomposed to a matrix of cell objects containing data and information about navigational path (headers, stubs, subheaders).

This project is developed on the University of Manchester as a part of my PhD

Requirements

The tool requires Java, OpenNLP, Weka toolkit, MySQL database, installed MetaMap and WordNet.

Other project dependences

Some manipulation on dataset (splitting data to training, testing and cross-validation sets, downloading data, extracting tables etc.) are done by python scripts in TableMiningHelpers git project.

Database output of this system may be used as input database for the MedCurator project

You need also to checkout Marvin project and include reference to it in a project.

License

The tool is under GNU/GPL 3 license. Licence agreement may be read here: http://www.gnu.org/copyleft/gpl.html

Referencing

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].