All Projects → elikip → Bist Parser

elikip / Bist Parser

Licence: apache-2.0
Graph-based and Transition-based dependency parsers based on BiLSTMs

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bist Parser

Tensorflow qrnn
QRNN implementation for TensorFlow
Stars: ✭ 241 (-6.23%)
Mutual labels:  natural-language-processing
Low Resource Languages
Resources for conservation, development, and documentation of low resource (human) languages.
Stars: ✭ 247 (-3.89%)
Mutual labels:  natural-language-processing
Deep Learning Interview Book
深度学习面试宝典(含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向)
Stars: ✭ 3,677 (+1330.74%)
Mutual labels:  natural-language-processing
Jack
Jack the Reader
Stars: ✭ 242 (-5.84%)
Mutual labels:  natural-language-processing
Soulvercore
A powerful Swift framework for evaluating mathematical expressions
Stars: ✭ 245 (-4.67%)
Mutual labels:  natural-language-processing
Awesome Tensorlayer
A curated list of dedicated resources and applications
Stars: ✭ 248 (-3.5%)
Mutual labels:  natural-language-processing
Cmrc2018
A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
Stars: ✭ 238 (-7.39%)
Mutual labels:  natural-language-processing
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-0.78%)
Mutual labels:  natural-language-processing
Awesome Grounding
awesome grounding: A curated list of research papers in visual grounding
Stars: ✭ 247 (-3.89%)
Mutual labels:  natural-language-processing
Prose
📖 A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.
Stars: ✭ 2,840 (+1005.06%)
Mutual labels:  natural-language-processing
Bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Stars: ✭ 3,443 (+1239.69%)
Mutual labels:  natural-language-processing
Insight
Repository for Project Insight: NLP as a Service
Stars: ✭ 246 (-4.28%)
Mutual labels:  natural-language-processing
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (-2.72%)
Mutual labels:  natural-language-processing
Summarization Papers
Summarization Papers
Stars: ✭ 238 (-7.39%)
Mutual labels:  natural-language-processing
Lectures
Oxford Deep NLP 2017 course
Stars: ✭ 15,162 (+5799.61%)
Mutual labels:  natural-language-processing
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+1148.64%)
Mutual labels:  natural-language-processing
Good Papers
I try my best to keep updated cutting-edge knowledge in Machine Learning/Deep Learning and Natural Language Processing. These are my notes on some good papers
Stars: ✭ 248 (-3.5%)
Mutual labels:  natural-language-processing
Ai Job Notes
AI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)
Stars: ✭ 3,191 (+1141.63%)
Mutual labels:  natural-language-processing
Articutapi
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (-1.95%)
Mutual labels:  natural-language-processing
Book Nlp
Natural language processing pipeline for book-length documents
Stars: ✭ 249 (-3.11%)
Mutual labels:  natural-language-processing

BIST Parsers

Graph & Transition based dependency parsers using BiLSTM feature extractors.

The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations. Futher materials could be found here.

Required software

Train a parsing model

The software requires having a training.conll and development.conll files formatted according to the CoNLL data format. For the faster graph-based parser change directory to bmstparser (1200 words/sec), and for the more accurate transition-based parser change directory to barchybrid (800 word/sec). The benchmark was performed on a Mac book pro with i7 processor. The graph-based parser acheives an accuracy of 93.8 UAS and the transition-based parser an accuracy of 94.7 UAS on the standard Penn Treebank dataset (Standford Dependencies). The transition-based parser requires no part-of-speech tagging and setting all the tags to NN will produce the expected accuracy. The model and param files achieving those scores are available for download (Graph-based model, Transition-based model). The trained models include improvements beyond those described in the paper, to be published soon.

To train a parsing model with for either parsing architecture type the following at the command prompt:

python src/parser.py --dynet-seed 123456789 [--dynet-mem XXXX] --outdir [results directory] --train training.conll --dev development.conll --epochs 30 --lstmdims 125 --lstmlayers 2 [--extrn extrn.vectors] --bibi-lstm

We use the same external embedding used in Transition-Based Dependency Parsing with Stack Long Short-Term Memory which can be downloaded from the authors github repository and directly here.

If you are training a transition-based parser then for optimal results you should add the following to the command prompt --k 3 --usehead --userl. These switch will set the stack to 3 elements; use the BiLSTM of the head of trees on the stack as feature vectors; and add the BiLSTM of the right/leftmost children to the feature vectors.

Note 1: You can run it without pos embeddings by setting the pos embedding dimensions to zero (--pembedding 0).

Note 2: The reported test result is the one matching the highest development score.

Note 3: The parser calculates (after each iteration) the accuracies excluding punctuation symbols by running the eval.pl script from the CoNLL-X Shared Task and stores the results in directory specified by the --outdir.

Note 4: The external embeddings parameter is optional and better not used when train/predicting a graph-based model.

Parse data with your parsing model

The command for parsing a test.conll file formatted according to the CoNLL data format with a previously trained model is:

python src/parser.py --predict --outdir [results directory] --test test.conll [--extrn extrn.vectors] --model [trained model file] --params [param file generate during training]

The parser will store the resulting conll file in the out directory (--outdir).

Note 1: If you are using the arc-hybrid trained model we provided please use the --extrn flag and specify the location of the external embeddings file.

Note 2: If you are using the first-order trained model we provided please do not use the --extrn flag.

Citation

If you make use of this software for research purposes, we'll appreciate citing the following:

@article{DBLP:journals/tacl/KiperwasserG16,
    author    = {Eliyahu Kiperwasser and Yoav Goldberg},
    title     = {Simple and Accurate Dependency Parsing Using Bidirectional {LSTM}
           Feature Representations},
    journal   = {{TACL}},
    volume    = {4},
    pages     = {313--327},
    year      = {2016},
    url       = {https://transacl.org/ojs/index.php/tacl/article/view/885},
    timestamp = {Tue, 09 Aug 2016 14:51:09 +0200},
    biburl    = {http://dblp.uni-trier.de/rec/bib/journals/tacl/KiperwasserG16},
    bibsource = {dblp computer science bibliography, http://dblp.org}
}

Forks

BIST-PyTorch: A PyTorch implementation of the BIST Parsers (for graph based parser only).

BIST-COVINGTON: A neural implementation of the Covington's algorithm for non-projective dependency parsing. It extends the original BIST transition-based a greedy parser by including a dynamic oracle for non-projective parsing to mitigate error propagation.

Uppsala Parser: A transition-based parser for Universal Dependencies with BiLSTM word and character representations.

License

This software is released under the terms of the Apache License, Version 2.0.

Contact

For questions and usage issues, please contact [email protected]

Credits

Eliyahu Kiperwasser

Yoav Goldberg

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].