habeanf / yap

Licence: Apache-2.0 license

Yet Another (natural language) Parser

Programming Languages

31211 projects - #10 most used programming language

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to yap

udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.

Stars: ✭ 15 (-62.5%)

Mutual labels: disambiguation, dependency-parser, morphological-analysis, morphological-disambiguator

HebPipe

An NLP pipeline for Hebrew

Stars: ✭ 15 (-62.5%)

Mutual labels: hebrew, universal-dependencies, morphological-analysis

frog

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Stars: ✭ 70 (+75%)

Mutual labels: computational-linguistics, dependency-parser

datalinguist

Stanford CoreNLP in idiomatic Clojure.

Stars: ✭ 93 (+132.5%)

Mutual labels: computational-linguistics, dependency-parser

GrammarEngine

Грамматический Словарь Русского Языка (+ английский, японский, etc)

Stars: ✭ 68 (+70%)

Mutual labels: nlp-parsing, morphological-analysis

Neural-Morphological-Disambiguation-for-Turkish-DEPRECATED

Neural morphological disambiguation for Turkish. Implemented in DyNet

Stars: ✭ 11 (-72.5%)

Mutual labels: morphological-analysis, morphological-disambiguator

mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Stars: ✭ 21 (-47.5%)

Mutual labels: computational-linguistics

ArabicProcessingCog

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

Stars: ✭ 19 (-52.5%)

Mutual labels: computational-linguistics

word2vec-tsne

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Stars: ✭ 59 (+47.5%)

Mutual labels: computational-linguistics

kaldi helpers

🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

Stars: ✭ 13 (-67.5%)

Mutual labels: computational-linguistics

pylangacq

Language Acquisition Research Tools

Stars: ✭ 33 (-17.5%)

Mutual labels: computational-linguistics

biblio-glutton

A high performance bibliographic information service

Stars: ✭ 54 (+35%)

Mutual labels: disambiguation

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.

Stars: ✭ 13 (-67.5%)

Mutual labels: computational-linguistics

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

Stars: ✭ 32 (-20%)

Mutual labels: computational-linguistics

Name-disambiguation

同名论文消歧的工程化方案（参考2019智源-aminer人名消歧竞赛第一名方案）

Stars: ✭ 17 (-57.5%)

Mutual labels: disambiguation

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (+27.5%)

Mutual labels: computational-linguistics

esapp

An unsupervised Chinese word segmentation tool.

Stars: ✭ 13 (-67.5%)

Mutual labels: computational-linguistics

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Stars: ✭ 20 (-50%)

Mutual labels: computational-linguistics

CISTEM

Stemmer for German

Stars: ✭ 33 (-17.5%)

Mutual labels: computational-linguistics

sembei

🍘 単語分割を経由しない単語埋め込み 🍘

Stars: ✭ 14 (-65%)

Mutual labels: computational-linguistics

View All Similar Projects ➔

yap - Yet Another Parser

This repository is no longer maintained.

For the latest and greatest see https://github.com/onlplab/yap

yap is yet another parser written in Go. It was implemented to test the hypothesis of my MSc thesis on Joint Morpho-Syntactic Processing of MRLs in a Transition Based Framework at IDC Herzliya with my advisor, Reut Tsarfaty. A paper on the morphological analysis and disambiguation aspect for Modern Hebrew and Universal Dependencies was accepted to COLING 2016

yap is currently provided with a model for Modern Hebrew, trained on a heavily updated version of the SPMRL 2014 Hebrew treebank. We hope to publish the updated treebank soon as well.

yap contains an implementation of the framework and parser of zpar from Z&N 2011 (Transition-based Dependency Parsing with Rich Non-local Features by Zhang and Nivre, 2011) with flags for precise output parity (i.e. bug replication), trained on the morphologically disambiguated Modern Hebrew treebank.

yap is under active development and documentation.

DO NOT USE FOR PRODUCTION

Requirements

Go
bzip2
4-16 CPU cores
~4.5GB RAM for Morphological Disambiguation
~2GB RAM for Dependency Parsing

Compilation

Download and install Go
Setup a Go environment:
- Create a directory (usually per workspace/project) mkdir yapproj; cd yapproj
- Set $GOPATH environment variable to your workspace: export GOPATH=path/to/yapproj
- In the workspace directory create 3 subdirectories: mkdir src pkg bin
- cd into the src directory cd src
Clone the repository in the src folder of the workspace, then:

cd yap
go get .
go build .
./yap

Bunzip the Hebrew MD model: bunzip2 data/hebmd.b32.bz2
Bunzip the Hebrew Dependency Parsing model: bunzip2 data/dep.b64.bz2

You may want to use a go workspace manager or have a shell script to set $GOPATH to <.../yapproj>

Processing Modern Hebrew

Currently only Pipeline Morphological Analysis, Disambiguation, and Dependency Parsing of pre-tokenized Hebrew text is supported. For Hebrew Morphological Analysis, the input format should have tokens separated by a newline, with another newline to separate sentences.

The lattice format as output by the analyzer can be used as-is for disambiguation.

For example:

עשרות
אנשים
מגיעים
מתאילנד
...

כך
אמר
ח"כ
...

Note: The input must be in UTF-8 encoding. yap will process ISO-8859-* encodings incorrectly.

Commands for morphological analysis and disambiguation:

./yap hebma -raw input.raw -out lattices.conll -stream
./yap md -in lattices.conll -om output.conll -stream

The output of the morphological disambiguator can be used as input for the dependency parser. Command for dependency parsing:

./yap dep -inl output.conll -oc dep_output.conll

Citation

If you make use of this software for research, we would appreciate the following citation:

@InProceedings{moretsarfatycoling2016,
  author = {Amir More and Reut Tsarfaty},
  title = {Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies},
  booktitle = {Proceedings of COLING 2016},
  year = {2016},
  month = {december},
  location = {Osaka}
}

HEBLEX, a Morphological Analyzer for Modern Hebrew in yap, relies on a slightly modified version of the BGU Lexicon. Please acknowledge and cite the work on the BGU Lexicon with this citation:

@inproceedings{adler06,
    Author = {Adler, Meni and Elhadad, Michael},
    Booktitle = {ACL},
    Crossref = {conf/acl/2006},
    Editor = {Calzolari, Nicoletta and Cardie, Claire and Isabelle, Pierre},
    Ee = {http://aclweb.org/anthology/P06-1084},
    Interhash = {6e302df82f4d7776cc487d5b8623d3db},
    Intrahash = {c7ac3ecfe40d039cd6c9ec855cb432db},
    Keywords = {dblp},
    Publisher = {The Association for Computer Linguistics},
    Timestamp = {2013-08-13T15:11:00.000+0200},
    Title = {An Unsupervised Morpheme-Based HMM for {H}ebrew Morphological
        Disambiguation},
    Url = {http://dblp.uni-trier.de/db/conf/acl/acl2006.html#AdlerE06},
    Year = 2006,
    Bdsk-Url-1 = {http://dblp.uni-trier.de/db/conf/acl/acl2006.html#AdlerE06}}

License

This software is released under the terms of the Apache License, Version 2.0.

The Apache license does not apply to the BGU Lexicon. Please contact Reut Tsarfaty regarding licensing of the lexicon.

Contact

You may contact me at mygithubuser at gmail or Reut Tsarfaty at reutts at openu dot ac dot il

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

habeanf / yap

Programming Languages

Labels

Projects that are alternatives of or similar to yap

yap - Yet Another Parser

This repository is no longer maintained.

For the latest and greatest see https://github.com/onlplab/yap

Requirements

Compilation

Processing Modern Hebrew

Citation

License

Contact