Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → elikip → Bist Parser

elikip / Bist Parser

Licence: apache-2.0

Graph-based and Transition-based dependency parsers based on BiLSTMs

Programming Languages

139335 projects - #7 most used programming language

Labels

natural-language-processing

Projects that are alternatives of or similar to Bist Parser

Tensorflow qrnn

QRNN implementation for TensorFlow

Stars: ✭ 241 (-6.23%)

Mutual labels: natural-language-processing

Low Resource Languages

Resources for conservation, development, and documentation of low resource (human) languages.

Stars: ✭ 247 (-3.89%)

Mutual labels: natural-language-processing

Deep Learning Interview Book

深度学习面试宝典（含数学、机器学习、深度学习、计算机视觉、自然语言处理和SLAM等方向）

Stars: ✭ 3,677 (+1330.74%)

Mutual labels: natural-language-processing

Jack the Reader

Stars: ✭ 242 (-5.84%)

Mutual labels: natural-language-processing

A powerful Swift framework for evaluating mathematical expressions

Stars: ✭ 245 (-4.67%)

Mutual labels: natural-language-processing

Awesome Tensorlayer

A curated list of dedicated resources and applications

Stars: ✭ 248 (-3.5%)

Mutual labels: natural-language-processing

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

Stars: ✭ 238 (-7.39%)

Mutual labels: natural-language-processing

A dataset of millions of news articles scraped from a curated list of data sources.

Stars: ✭ 255 (-0.78%)

Mutual labels: natural-language-processing

Awesome Grounding

awesome grounding: A curated list of research papers in visual grounding

Stars: ✭ 247 (-3.89%)

Mutual labels: natural-language-processing

📖 A Golang library for text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

Stars: ✭ 2,840 (+1005.06%)

Mutual labels: natural-language-processing

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Stars: ✭ 3,443 (+1239.69%)

Mutual labels: natural-language-processing

Repository for Project Insight: NLP as a Service

Stars: ✭ 246 (-4.28%)

Mutual labels: natural-language-processing

Datacamp Python Data Science Track

All the slides, accompanying code and exercises all stored in this repo. 🎈

Stars: ✭ 250 (-2.72%)

Mutual labels: natural-language-processing

Summarization Papers

Summarization Papers

Stars: ✭ 238 (-7.39%)

Mutual labels: natural-language-processing

Oxford Deep NLP 2017 course

Stars: ✭ 15,162 (+5799.61%)

Mutual labels: natural-language-processing

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+1148.64%)

Mutual labels: natural-language-processing

I try my best to keep updated cutting-edge knowledge in Machine Learning/Deep Learning and Natural Language Processing. These are my notes on some good papers

Stars: ✭ 248 (-3.5%)

Mutual labels: natural-language-processing

AI算法岗求职攻略（涵盖准备攻略、刷题指南、内推和AI公司清单等资料）

Stars: ✭ 3,191 (+1141.63%)

Mutual labels: natural-language-processing

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到 SIGHAN 2005 F1-measure 94% 以上，Recall 96% 以上的成績。

Stars: ✭ 252 (-1.95%)

Mutual labels: natural-language-processing

Natural language processing pipeline for book-length documents

Stars: ✭ 249 (-3.11%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

BIST Parsers

Graph & Transition based dependency parsers using BiLSTM feature extractors.

The techniques behind the parser are described in the paper Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations. Futher materials could be found here.

Required software

Python 2.7 interpreter
DyNet library

Train a parsing model

The software requires having a training.conll and development.conll files formatted according to the CoNLL data format. For the faster graph-based parser change directory to bmstparser (1200 words/sec), and for the more accurate transition-based parser change directory to barchybrid (800 word/sec). The benchmark was performed on a Mac book pro with i7 processor. The graph-based parser acheives an accuracy of 93.8 UAS and the transition-based parser an accuracy of 94.7 UAS on the standard Penn Treebank dataset (Standford Dependencies). The transition-based parser requires no part-of-speech tagging and setting all the tags to NN will produce the expected accuracy. The model and param files achieving those scores are available for download (Graph-based model, Transition-based model). The trained models include improvements beyond those described in the paper, to be published soon.

To train a parsing model with for either parsing architecture type the following at the command prompt:

python src/parser.py --dynet-seed 123456789 [--dynet-mem XXXX] --outdir [results directory] --train training.conll --dev development.conll --epochs 30 --lstmdims 125 --lstmlayers 2 [--extrn extrn.vectors] --bibi-lstm

We use the same external embedding used in Transition-Based Dependency Parsing with Stack Long Short-Term Memory which can be downloaded from the authors github repository and directly here.

If you are training a transition-based parser then for optimal results you should add the following to the command prompt --k 3 --usehead --userl. These switch will set the stack to 3 elements; use the BiLSTM of the head of trees on the stack as feature vectors; and add the BiLSTM of the right/leftmost children to the feature vectors.

Note 1: You can run it without pos embeddings by setting the pos embedding dimensions to zero (--pembedding 0).

Note 2: The reported test result is the one matching the highest development score.

Note 3: The parser calculates (after each iteration) the accuracies excluding punctuation symbols by running the eval.pl script from the CoNLL-X Shared Task and stores the results in directory specified by the --outdir.

Note 4: The external embeddings parameter is optional and better not used when train/predicting a graph-based model.

Parse data with your parsing model

The command for parsing a test.conll file formatted according to the CoNLL data format with a previously trained model is:

python src/parser.py --predict --outdir [results directory] --test test.conll [--extrn extrn.vectors] --model [trained model file] --params [param file generate during training]

The parser will store the resulting conll file in the out directory (--outdir).

Note 1: If you are using the arc-hybrid trained model we provided please use the --extrn flag and specify the location of the external embeddings file.

Note 2: If you are using the first-order trained model we provided please do not use the --extrn flag.

Citation

If you make use of this software for research purposes, we'll appreciate citing the following:

@article{DBLP:journals/tacl/KiperwasserG16,
    author    = {Eliyahu Kiperwasser and Yoav Goldberg},
    title     = {Simple and Accurate Dependency Parsing Using Bidirectional {LSTM}
           Feature Representations},
    journal   = {{TACL}},
    volume    = {4},
    pages     = {313--327},
    year      = {2016},
    url       = {https://transacl.org/ojs/index.php/tacl/article/view/885},
    timestamp = {Tue, 09 Aug 2016 14:51:09 +0200},
    biburl    = {http://dblp.uni-trier.de/rec/bib/journals/tacl/KiperwasserG16},
    bibsource = {dblp computer science bibliography, http://dblp.org}
}

Forks

BIST-PyTorch: A PyTorch implementation of the BIST Parsers (for graph based parser only).

BIST-COVINGTON: A neural implementation of the Covington's algorithm for non-projective dependency parsing. It extends the original BIST transition-based a greedy parser by including a dynamic oracle for non-projective parsing to mitigate error propagation.

Uppsala Parser: A transition-based parser for Universal Dependencies with BiLSTM word and character representations.

License

This software is released under the terms of the Apache License, Version 2.0.

Contact

For questions and usage issues, please contact [email protected]

Credits

Eliyahu Kiperwasser

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 257

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗