All Projects → bnosac → Rdrpostagger

bnosac / Rdrpostagger

R package for Ripple Down Rules-based Part-Of-Speech Tagging (RDRPOS). On more than 45 languages.

Programming Languages

java
68154 projects - #9 most used programming language
r
7636 projects

Projects that are alternatives of or similar to Rdrpostagger

Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (+209.68%)
Mutual labels:  pos, natural-language-processing, pos-tagging
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (+416.13%)
Mutual labels:  natural-language-processing, pos-tagging, r-package
Malaya
Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (+670.97%)
Mutual labels:  natural-language-processing, pos-tagging
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (+258.06%)
Mutual labels:  pos, natural-language-processing
Pymystem3
A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggestion, please make a pull request. We are very open to accepting any contributions.
Stars: ✭ 224 (+622.58%)
Mutual labels:  pos, tagging
Vntk
Vietnamese NLP Toolkit for Node
Stars: ✭ 170 (+448.39%)
Mutual labels:  natural-language-processing, pos-tagging
Cleannlp
R package providing annotators and a normalized data model for natural language processing
Stars: ✭ 174 (+461.29%)
Mutual labels:  natural-language-processing, r-package
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+554.84%)
Mutual labels:  pos, pos-tagging
Articutapi
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (+712.9%)
Mutual labels:  natural-language-processing, pos-tagging
Vncorenlp
A Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+1041.94%)
Mutual labels:  natural-language-processing, pos-tagging
Nlpnet
A neural network architecture for NLP tasks, using cython for fast performance. Currently, it can perform POS tagging, SRL and dependency parsing.
Stars: ✭ 379 (+1122.58%)
Mutual labels:  natural-language-processing, pos-tagging
Hanlp
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
Stars: ✭ 24,626 (+79338.71%)
Mutual labels:  natural-language-processing, pos-tagging
Googlelanguager
R client for the Google Translation API, Google Cloud Natural Language API and Google Cloud Speech API
Stars: ✭ 145 (+367.74%)
Mutual labels:  natural-language-processing, r-package
Deeptoxic
top 1% solution to toxic comment classification challenge on Kaggle.
Stars: ✭ 180 (+480.65%)
Mutual labels:  pos, natural-language-processing
rippletagger
RippleTagger identifies part-of-speech tags (Nouns, Verbs, and so on...). You give it a sentence, it gives you a list of tags back.
Stars: ✭ 12 (-61.29%)
Mutual labels:  multi-language, pos-tagging
Deta parser
快速中文分词分析word segmentation
Stars: ✭ 476 (+1435.48%)
Mutual labels:  pos, multi-language
Jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for the latest lucene,solr,elasticsearch
Stars: ✭ 754 (+2332.26%)
Mutual labels:  natural-language-processing, pos-tagging
Ieeer
Search IEEE publications in R
Stars: ✭ 12 (-61.29%)
Mutual labels:  r-package
Bpemb
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
Stars: ✭ 909 (+2832.26%)
Mutual labels:  natural-language-processing
Node Api.ai
[DEPRECATED] Ultimate Node.JS SDK for api.ai
Stars: ✭ 12 (-61.29%)
Mutual labels:  natural-language-processing

RDRPOSTagger

R package to perform Parts of Speech tagging and morphological tagging based on the Ripple Down Rules-based Part-Of-Speech Tagger (RDRPOS) available at https://github.com/datquocnguyen/RDRPOSTagger. RDRPOSTagger supports pre-trained POS tagging models for 45 languages.

The R package allows you to perform 3 types of tagging.

  • UniversalPOS annotation where a reduced Part of Speech and globally used tagset which is consistent across languages is used to assign words with a certain label. This type of tagging is available for the following languages: Ancient_Greek, Ancient_Greek-PROIEL, Arabic, Basque, Belarusian, Bulgarian, Catalan, Chinese, Coptic, Croatian, Czech, Czech-CAC, Czech-CLTT, Danish, Dutch, Dutch-LassySmall, English, English-LinES, English-ParTUT, Estonian, Finnish, Finnish-FTB, French, French-ParTUT, French-Sequoia, Galician, Galician-TreeGal, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Italian-ParTUT, Japanese, Korean, Latin, Latin-ITTB, Latin-PROIEL, Latvian, Lithuanian, Norwegian-Bokmaal, Norwegian-Nynorsk, Old_Church_Slavonic, Persian, Polish, Portuguese, Portuguese-BR, Romanian, Russian, Russian-SynTagRus, Slovak, Slovenian, Slovenian-SST, Spanish, Spanish-AnCora, Swedish, Swedish-LinES, Tamil, Turkish, Urdu, Vietnamese.
  • POS for doing Parts of Speech annotation based on an extended language/treebank-specific POS tagset. for This type of tagging is available for the following languages: English, French, German, Hindi, Italian, Thai, Vietnamese
  • MORPH with very detailed morphological annotation. This type of tagging is available for the following languages: Bulgarian, Czech, Dutch, French, German, Portuguese, Spanish, Swedish

This is based on corpora collected and made available at http://universaldependencies.org.

Examples on Parts of Speech tagging

The following shows how to use the package

library(RDRPOSTagger)
models <- rdr_available_models()
models$MORPH$language
models$POS$language
models$UniversalPOS$language

x <- c("Oleg Borisovich Kulik is a Ukrainian-born Russian performance artist")
tagger <- rdr_model(language = "English", annotation = "POS")
rdr_pos(tagger, x = x)

x <- c("Dus godvermehoeren met pus in alle puisten, zei die schele van Van Bukburg.", 
       "Er was toen dat liedje van tietenkonttieten kont tieten kontkontkont",
       "  ", "", NA)
tagger <- rdr_model(language = "Dutch", annotation = "MORPH")
rdr_pos(tagger, x = x)

tagger <- rdr_model(language = "Dutch", annotation = "UniversalPOS")
rdr_pos(tagger, x = x)

The output of the POS tagging shows the following elements:

 doc_id token_id            token   pos
     d1        1              Dus   ADV
     d1        2   godvermehoeren  VERB
     d1        3              met   ADP
     d1        4              pus  NOUN
     d1        5               in   ADP
     d1        6             alle  PRON
     d1        7          puisten  NOUN
     d1        8                , PUNCT
     d1        9              zei  VERB
     d1       10              die  PRON
     d1       11           schele   ADJ
     d1       12              van   ADP
     d1       13              Van PROPN
     d1       14          Bukburg PROPN
     d1       15                . PUNCT
     d2        1               Er   ADV
     d2        2              was   AUX
     d2        3             toen SCONJ
     d2        4              dat SCONJ
     d2        5           liedje  NOUN
     d2        6              van   ADP
     d2        7 tietenkonttieten  VERB
     d2        8             kont PROPN
     d2        9           tieten  VERB
     d2       10     kontkontkont PROPN
     d3        0             <NA>  <NA>
     d4        0             <NA>  <NA>
     d5        0             <NA>  <NA>

More information about the model and the tagging can be found at https://github.com/datquocnguyen/RDRPOSTagger

The general architecture and experimental results of RDRPOSTagger can be found in the following papers:

Installation

Installation can easily be done as follows.

install.packages("rJava")
install.packages("data.table")
install.packages("RDRPOSTagger", repos = "http://www.datatailor.be/rcube", type = "source")

Or with devtools

devtools::install_github("bnosac/RDRPOSTagger", build_vignettes = TRUE)

More details in the package documentation and package vignette

vignette("rdrpostagger-overview", package = "RDRPOSTagger")

License

The package is licensed under the GPL-3 license as described at http://www.gnu.org/licenses/gpl-3.0.html.

Support in text mining

Need support in text mining. Contact BNOSAC: http://www.bnosac.be

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].