All Projects → rknaebel → discopy

rknaebel / discopy

Licence: MIT license
End-to-end shallow discourse parser

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to discopy

DiscourseSenser
Sense Disambiguation of Connectives for PDTB-Style Discourse Parsing
Stars: ✭ 13 (-18.75%)
Mutual labels:  discourse-analysis, discourse-parsing, pdtb
Discovery
Mining Discourse Markers for Unsupervised Sentence Representation Learning
Stars: ✭ 48 (+200%)
Mutual labels:  discourse-analysis, discourse-parsing, pdtb
Attention mechanism-event-extraction
Attention mechanism in CNNs to extract events of interest
Stars: ✭ 17 (+6.25%)
Mutual labels:  discourse-analysis

Shallow Discourse Parser

This project aims to provide an implementation of the standard Lin et al. architecture as well as recent advances in neural architectures. It consists of a parser pipeline architecture which stacks individual parser components to continuously add discourse information. The focus is currently on explicit relations that were handled first in most pipelines. Further, remaining sentence pairs without explicit sense relation are processed with the non-explicit component. The current implementation is following the Conll2016 implementation guidelines. It accepts PDTB2 CoNLL format as input for training and evaluation and mainly produces a line-based json document format.

The parser is presented at the CODI 2021 Workshop. For more information, checkout the paper discopy: A Neural System for Shallow Discourse Parsing.

Setup

You can easily install discopy by using pip:

pip install git+https://github.com/rknaebel/discopy

or you just clone the repository. Then you can install discopy through pip

pip install -e path/to/discopy

Usage

Discopy currently supports different modes and distinguishes standard feature-based models and neural-based (transformer) models. These example commands are executed from within the repository folder.

Evaluation

discopy-eval path/to/conll-gold path/to/prediction

Standard Architecture

Training

discopy-train lin path/to/model path/to/conll

Training data format is json, the folder contains subfolders en.{train,dev,test} with files relations.json and parses.json.

Prediction

discopy-predict lin path/to/conll/en.part path/to/model/lin
discopy-parse lin path/to/model/lin -i path/to/some/documents.json
discopy-tokenize -i path/to/textfile | discopy-add-parses -c | discopy-parse lin models/lin

Neural Architecture

Neural components are a little bit more complex and ofter require/allow for more hyper-parameters while designing the component and throughout the training process. The training cli gives only a single component-parameter choice. For individual adaptions, one has to write its own training script. The bert-model parameter corresponds to the huggingface transformers model names.

Training

discopy-nn-train [BERT-MODEL] [MODEL-PATH] [CONLL-PATH]

Training data format follows the one above.

Prediction

discopy-nn-predict [BERT-MODEL] [MODEL-PATH] [CONLL-PATH]
discopy-nn-parse [BERT-MODEL] [MODEL-PATH] -i [JSON-INPUT]
cat path/to/textfile | discopy-nn-parse [BERT-MODEL] [MODEL-PATH]
discopy-tokenize --tokenize-only -i path/to/textfile | discopy-nn-parse bert-base-cased models/pipeline-bert-2
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].