All Projects → arne-cl → discoursegraphs

arne-cl / discoursegraphs

Licence: BSD-3-Clause license
linguistic converter / merging tool for multi-level annotated corpora. graph-based (using Python and NetworkX).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to discoursegraphs

Oas Kit
Convert Swagger 2.0 definitions to OpenAPI 3.0 and resolve/validate/lint
Stars: ✭ 516 (+997.87%)
Mutual labels:  converter, conversion
csv2html
Convert CSV files to HTML tables
Stars: ✭ 64 (+36.17%)
Mutual labels:  converter, conversion
Unitsnet
Makes life working with units of measurement just a little bit better.
Stars: ✭ 641 (+1263.83%)
Mutual labels:  converter, conversion
sublime-atomizr
Convert Sublime Text completions into Atom (or Visual Studio Code) snippets, and vice versa.
Stars: ✭ 12 (-74.47%)
Mutual labels:  converter, conversion
bank2ynab
Easily convert and import your bank's statements into YNAB. This project consolidates other conversion efforts into one universal tool.
Stars: ✭ 197 (+319.15%)
Mutual labels:  converter, conversion
Length.js
📏 JavaScript library for length units conversion.
Stars: ✭ 292 (+521.28%)
Mutual labels:  converter, conversion
Hrconvert2
A self-hosted, drag-and-drop, & nosql file conversion server that supports 62x file formats.
Stars: ✭ 132 (+180.85%)
Mutual labels:  converter, conversion
xsampa
X-SAMPA to IPA converter
Stars: ✭ 20 (-57.45%)
Mutual labels:  converter, conversion
fp-units
An FP-oriented library to easily convert CSS units.
Stars: ✭ 18 (-61.7%)
Mutual labels:  converter, conversion
cpc
Text calculator with support for units and conversion
Stars: ✭ 89 (+89.36%)
Mutual labels:  converter, conversion
quill-markdown-toolbar
A Quill.js module for converting markdown text to rich text format
Stars: ✭ 13 (-72.34%)
Mutual labels:  converter, conversion
vectorexpress-api
Vector Express is a free service and API for converting, analyzing and processing vector files.
Stars: ✭ 66 (+40.43%)
Mutual labels:  converter, conversion
dftools
Tools for Star Wars: Dark Forces assets.
Stars: ✭ 18 (-61.7%)
Mutual labels:  converter, conversion
Remarshal
Convert between CBOR, JSON, MessagePack, TOML, and YAML
Stars: ✭ 421 (+795.74%)
Mutual labels:  converter, conversion
bitsnpicas
Bits'N'Picas - Bitmap & Emoji Font Creation & Conversion Tools
Stars: ✭ 171 (+263.83%)
Mutual labels:  converter, conversion
Ec2 Spot Converter
A tool to convert AWS EC2 instances back and forth between On-Demand and Spot billing models.
Stars: ✭ 108 (+129.79%)
Mutual labels:  converter, conversion
qTsConverter
A simple tool to convert qt translation file (ts) to other format (xlsx / csv) and vice versa
Stars: ✭ 26 (-44.68%)
Mutual labels:  converter, conversion
Kepubify
Fast, standalone EPUB to KEPUB converter CLI app / library (and a few other utilities).
Stars: ✭ 225 (+378.72%)
Mutual labels:  converter, conversion
caffe weight converter
Caffe-to-Keras weight converter. Can also export weights as Numpy arrays for further processing.
Stars: ✭ 68 (+44.68%)
Mutual labels:  converter, conversion
BlocksConverter
A PocketMine-MP plugin allows you to convert Minecraft PC maps to MCPE/Bedrock maps or vice-versa.
Stars: ✭ 47 (+0%)
Mutual labels:  converter, conversion

DiscourseGraphs

Latest version BSD License Build status Test coverage Code Issues Docker build status

This library enables you to process linguistic corpora with multiple levels of annotations by:

  1. converting the different annotation formats into separate graphs and
  2. merging these graphs into a single multidigraph (based on the common tokenization of the annotation layers)
  3. exporting your (merged) graphs into several output formats
  4. visualizing linguistic graphs directly in an IPython notebook

Import formats

So far, the following formats can be imported and merged:

  • TigerXML (a format for representing tree-like syntax graphs with secondary edges)
  • NeGra Export Format (a format used i.a. for the TüBa-D/Z Treebank)
  • Penn Treebank format (an s-expressions/lisp/brackets format for representing syntax trees)
  • a number of formats for Rhetorical Structure Theory:
    • RS3 (a format used by RSTTool to annotate documents with Rhetorical Structure Theory)
    • the .dis "LISP" format used by the RST-DT corpus
    • URML (a format for underspecified rhetorical structure trees)
  • MMAX2 (a format / GUI tool for annotating spans and connections between them (e.g. coreferences)
  • CoNLL 2009 and CoNLL 2010 formats (used for annotating i.a. dependency parses and coreference links)
  • ConanoXML (a format for annotating connectives, used by Conano)
  • Decour (an XML format used by a corpus of DEceptive statements in Italian COURts)
  • EXMARaLDA, a format for annotating spans in spoken or written language
  • an ad-hoc plain text format for annotating expletives (you're probably not interested in)

Export formats

discoursegraphs can export graphs into the following formats / for the following tools:

  • dot format, which is used by the open source graph visualization software graphviz
  • geoff format, used by the neo4j graph database
  • GEXF and GraphML (common interchange formats for graphs used by various tools such as Gephi and Cytoscape)
  • PAULA XML 1.1, an exchange format for linguistic data (exporter is still buggy)
  • EXMARaLDA, a tool for annotating spans in spoken or written language
  • CoNLL 2009 (so far, only tokens, sentence boundaries and coreferences are exported)

Installation

This should work on both Linux and Mac OSX using Python 2.7 and either pip or easy_install.

Install from PyPI

pip install discoursegraphs # prepend 'sudo' if needed

or, if you're oldschool:

easy_install discoursegraphs # prepend 'sudo' if needed

Install from source

sudo apt-get install python-dev libxml2-dev libxslt-dev pkg-config graphviz-dev libgraphviz-dev -y
sudo easy_install -U setuptools
git clone https://github.com/arne-cl/discoursegraphs.git
cd discoursegraphs
sudo python setup.py install

Usage

The command line interface of DiscourseGraphs allows you to merge syntax, rhetorical structure, connectives and expletives annotation files into one graph and to store this graph in one of several output formats (e.g. the geoff format used by the neo4j graph database or the dot format used by the graphviz plotting tool).

discoursegraphs -t syntax/maz-13915.xml -r rst/maz-13915.rs3 -c connectors/maz-13915.xml -a anaphora/tosik/das/maz-13915.txt -o dot
dot -Tpdf doc.dot > discoursegraph.pdf # generates a PDF from the dot file

If you're interested in working with just one of those layers, you'll have to call the code directly:

import discoursegraphs as dg
tiger_docgraph = dg.read_tiger('syntax/doc.xml')
rst_docgraph = dg.read_rs3('rst/doc.rs3')
expletives_docgraph = dg.read_anaphoricity('expletives/doc.txt')

All the document graphs generated in this example are derived from the networkx.MultiDiGraph class, so you should be able to use all of its methods.

Documentation

Source code documentation is available here, but you can always get an up-to-date local copy using Sphinx.

You can generate an HTML or PDF version by running these commands in the docs directory:

make latexpdf

to produce a PDF (docs/_build/latex/discoursegraphs.pdf) and

make html

to produce a set of HTML files (docs/_build/html/index.html).

Requirements

If you'd like to visualize your graphs, you will also need:

License and Citation

This software is released under a 3-Clause BSD license. If you use discoursegraphs in your academic work, please cite the following paper:

Neumann, A. 2015. discoursegraphs: A graph-based merging tool and converter for multilayer annotated corpora. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), pp. 309-312.

@inproceedings{neumann2015discoursegraphs,
  title={discoursegraphs: A graph-based merging tool and converter for multilayer annotated corpora},
  author={Neumann, Arne},
  booktitle={Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)},
  pages={309-312},
  year={2015}
}

Author

Arne Neumann

People who downloaded this also like

  • SaltNPepper: a converter framework for various linguistic data formats
  • educe: a library for handling discourse-annotated corpora (SDRT, RST and PDTB)
  • treetools: a library for converting treebanks and grammar extraction (supports i.a. TigerXML and Negra/Tüba-Export formats)
  • TCFnetworks: library for creating graphs from annotated text corpora (based on TCF).
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].