Alternatives and detailed information of rakutenma-python

ikegami-yukino / rakutenma-python

Licence: Apache-2.0 License

Rakuten MA (Python version)

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to rakutenma-python

Jumanpp

Juman++ (a Morphological Analyzer Toolkit)

Stars: ✭ 254 (+1593.33%)

Mutual labels: word-segmentation, pos-tagging, part-of-speech-tagger

Vncorenlp

A Vietnamese natural language processing toolkit (NAACL 2018)

Stars: ✭ 354 (+2260%)

Mutual labels: word-segmentation, pos-tagging

Jptdp

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Stars: ✭ 146 (+873.33%)

Mutual labels: pos-tagging, part-of-speech-tagger

Lac

百度NLP：分词，词性标注，命名实体识别，词重要性

Stars: ✭ 2,792 (+18513.33%)

Mutual labels: word-segmentation, part-of-speech-tagger

Qutuf

Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.

Stars: ✭ 84 (+460%)

Mutual labels: pos-tagging, part-of-speech-tagger

Rdrpostagger

A fast and accurate POS and morphological tagging toolkit (EACL 2014)

Stars: ✭ 126 (+740%)

Mutual labels: pos-tagging, part-of-speech-tagger

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (+540%)

Mutual labels: pos-tagging, part-of-speech-tagger

SynThai

Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning

Stars: ✭ 41 (+173.33%)

Mutual labels: word-segmentation, pos-tagging

Cws

Source code for an ACL2016 paper of Chinese word segmentation

Stars: ✭ 81 (+440%)

Mutual labels: chinese, word-segmentation

Kagome

Self-contained Japanese Morphological Analyzer written in pure Go

Stars: ✭ 554 (+3593.33%)

Mutual labels: japanese-language, pos-tagging

Articutapi

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到 SIGHAN 2005 F1-measure 94% 以上，Recall 96% 以上的成績。

Stars: ✭ 252 (+1580%)

Mutual labels: pos-tagging, part-of-speech-tagger

datalinguist

Stanford CoreNLP in idiomatic Clojure.

Stars: ✭ 93 (+520%)

Mutual labels: pos-tagging, part-of-speech-tagger

Nagisa

A Japanese tokenizer based on recurrent neural networks

Stars: ✭ 260 (+1633.33%)

Mutual labels: word-segmentation, pos-tagging

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (+1253.33%)

Mutual labels: word-segmentation, pos-tagging

pytorch Joint-Word-Segmentation-and-POS-Tagging

Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging

Stars: ✭ 37 (+146.67%)

Mutual labels: word-segmentation, pos-tagging

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+906.67%)

Mutual labels: word-segmentation, pos-tagging

chinese-nlp-ner

一套针对中文实体识别的BLSTM-CRF解决方案

Stars: ✭ 14 (-6.67%)

Mutual labels: chinese

udar

UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.

Stars: ✭ 15 (+0%)

Mutual labels: pos-tagging

cn-holiday

a lib for chinese holiday

Stars: ✭ 22 (+46.67%)

Mutual labels: chinese

SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Stars: ✭ 28 (+86.67%)

Mutual labels: word-segmentation

View All Similar Projects ➔

Rakuten MA Python

Rakuten MA Python (morphological analyzer) is a Python version of Rakuten MA (word segmentor + PoS Tagger) for Chinese and Japanese.

For details about Rakuten MA, See https://github.com/rakuten-nlp/rakutenma

See also http://qiita.com/yukinoi/items/925bc238185aa2fad8a7 (In Japanese)

Contributions are welcome!

Installation

pip install rakutenma

Example

from rakutenma import RakutenMA

# Initialize a RakutenMA instance with an empty model
# the default ja feature set is set already
rma = RakutenMA()

# Let's analyze a sample sentence (from http://tatoeba.org/jpn/sentences/show/103809)
# With a disastrous result, since the model is empty!
print(rma.tokenize("彼は新しい仕事できっと成功するだろう。"))

# Feed the model with ten sample sentences from tatoeba.com
# "tatoeba.json" is available at https://github.com/rakuten-nlp/rakutenma
import json
tatoeba = json.load(open("tatoeba.json"))
for i in tatoeba:
    rma.train_one(i)

# Now what does the result look like?
print(rma.tokenize("彼は新しい仕事できっと成功するだろう。"))

# Initialize a RakutenMA instance with a pre-trained model
rma = RakutenMA(phi=1024, c=0.007812)  # Specify hyperparameter for SCW (for demonstration purpose)
rma.load("model_ja.json")

# Set the feature hash function (15bit)
rma.hash_func = rma.create_hash_func(15)

# Tokenize one sample sentence
print(rma.tokenize("うらにわにはにわにわとりがいる"));

# Re-train the model feeding the right answer (pairs of [token, PoS tag])
res = rma.train_one(
       [["うらにわ","N-nc"],
        ["に","P-k"],
        ["は","P-rj"],
        ["にわ","N-n"],
        ["にわとり","N-nc"],
        ["が","P-k"],
        ["いる","V-c"]])
# The result of train_one contains:
#   sys: the system output (using the current model)
#   ans: answer fed by the user
#   update: whether the model was updated
print(res)

# Now what does the result look like?
print(rma.tokenize("うらにわにはにわにわとりがいる"))

NOTE

Added API

As compared to original RakutenMA, following methods are added:

RakutenMA::load(model_path) - Load model from JSON file
RakutenMA::save(model_path) - Save model to path

misc

As initial setting, following values are set:

rma.featset = CTYPE_JA_PATTERNS # RakutenMA.default_featset_ja
rma.hash_func = rma.create_hash_func(15)
rma.tag_scheme = "SBIEO" # if using Chinese, set "IOB2"

LICENSE

Apache License version 2.0

Copyright

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ikegami-yukino / rakutenma-python

Programming Languages

Labels

Projects that are alternatives of or similar to rakutenma-python

Rakuten MA Python

Installation

Example

NOTE

Added API

misc

LICENSE

Copyright