All Projects → EmilStenstrom → rippletagger

EmilStenstrom / rippletagger

Licence: other
RippleTagger identifies part-of-speech tags (Nouns, Verbs, and so on...). You give it a sentence, it gives you a list of tags back.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to rippletagger

Rdrpostagger
R package for Ripple Down Rules-based Part-Of-Speech Tagging (RDRPOS). On more than 45 languages.
Stars: ✭ 31 (+158.33%)
Mutual labels:  multi-language, pos-tagging
visual syntactic embedding video captioning
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Stars: ✭ 23 (+91.67%)
Mutual labels:  pos-tagging
gum
Repository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+491.67%)
Mutual labels:  pos-tagging
lisan
🌈i18n, Reimagined! 🚀A blazing fast and super small i18n library for Javascript
Stars: ✭ 85 (+608.33%)
Mutual labels:  multi-language
cazary
jQuery plugin of WYSIWYG editor that aims for fast, lightweight, stylish, customizable, cross-browser, and multi-language.
Stars: ✭ 12 (+0%)
Mutual labels:  multi-language
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+1158.33%)
Mutual labels:  pos-tagging
TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+600%)
Mutual labels:  pos-tagging
LMPHP
Multi-language management and support on the site.
Stars: ✭ 19 (+58.33%)
Mutual labels:  multi-language
react-native-multi-language-app
Multi Language example app with react native
Stars: ✭ 26 (+116.67%)
Mutual labels:  multi-language
datalinguist
Stanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+675%)
Mutual labels:  pos-tagging
pytorch Joint-Word-Segmentation-and-POS-Tagging
Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging
Stars: ✭ 37 (+208.33%)
Mutual labels:  pos-tagging
wink-nlp
Developer friendly Natural Language Processing ✨
Stars: ✭ 312 (+2500%)
Mutual labels:  pos-tagging
morphir
A universal language for business and technology
Stars: ✭ 70 (+483.33%)
Mutual labels:  multi-language
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (+475%)
Mutual labels:  pos-tagging
CODER
CODER: Knowledge infused cross-lingual medical term embedding for term normalization. [JBI, ACL-BioNLP 2022]
Stars: ✭ 24 (+100%)
Mutual labels:  multi-language
sequence labeling tf
Sequence Labeling in Tensorflow
Stars: ✭ 18 (+50%)
Mutual labels:  pos-tagging
comparable-text-miner
Comparable documents miner: Arabic-English morphological analysis, text processing, n-gram features extraction, POS tagging, dictionary translation, documents alignment, corpus information, text classification, tf-idf computation, text similarity computation, html documents cleaning
Stars: ✭ 31 (+158.33%)
Mutual labels:  pos-tagging
Onion Rime Files
電腦 Rime 洋蔥方案(注音、雙拼、拼音、形碼、行列30)
Stars: ✭ 88 (+633.33%)
Mutual labels:  multi-language
create-bazel-workspace
Generate a new polyglot Bazel workspace with minimal configuration
Stars: ✭ 16 (+33.33%)
Mutual labels:  multi-language
jekyll-skeleton
Scaffolding to start with a Jekyll website
Stars: ✭ 27 (+125%)
Mutual labels:  multi-language

This works, but is no longer maintained. Please see the original project for the latest updates: https://github.com/datquocnguyen/RDRPOSTagger

RippleTagger

RippleTagger identifies part-of-speech tags (Nouns, Verbs, and so on...). You give it a sentence, it gives you a list of tags back. Tagging is the first step in many language processing tasks.

Example usage

>>> from rippletagger.tagger import Tagger
>>> tagger = Tagger(language="en")
>>> print tagger.tag(u"The quick brown fox jumps over the lazy dog .")

[
    (u'The', u'DET'),
    (u'quick', u'ADJ'),
    (u'brown', u'ADJ'),
    (u'fox', u'NOUN'),
    (u'jumps', u'VERB'),
    (u'over', u'ADP'),
    (u'the', u'DET'),
    (u'lazy', u'ADJ'),
    (u'dog', u'NOUN'),
    (u'.', u'PUNCT'),
]

You can read about what the different tags mean at the Universial Dependencies project.

Why should you use RippleTagger?

  • It supports 40 (!) languages out of the box.
  • It's fast.
  • It has good accuracy.
  • It has no dependences and is pure python.

Installation

pip install rippletagger

Develop locally and run the tests

git clone [email protected]:EmilStenstrom/rippletagger.git
cd rippletagger
python setup.py test

Supported languages

You can use either the 2-letter code, 3-letter code or language name as the parameter to Tagger. In some cases there are more than one treebank available for a language. In that case you can choose which treebank you want to use by appending "-2", "-3" and so on to the language code.

2-letter code 3-letter code Name Treebank Accuracy
-- grc ancient_greek Ancient_Greek 91.56865075
-- grc ancient_greek Ancient_Greek-PROIEL 95.71938169
ar ara arabic Arabic 94.414521
eu eus basque Basque 92.42635595
bg bul bulgarian Bulgarian 96.1294013
ca cat catalan Catalan 96.51742106
zh zho chinese Chinese 89.45221445
hr hrv croatian Croatian 93.86666667
cs ces czech Czech 97.67695433
cs-2 ces-2 czech-2 Czech-CAC 97.82568807
cs-3 ces-3 czech-3 Czech-CLTT 97.00802724
da dan danish Danish 93.47382733
nl nld dutch Dutch 88.75577614
nl-2 nld-2 dutch-2 Dutch-LassySmall 94.36650592
en eng english English 92.70401658
en-2 eng-2 english-2 English-LinES 94.39924537
et est estonian Estonian 93.83607943
fi fin finnish Finnish 92.2428884
fi-2 fin-2 finnish-2 Finnish-FTB 90.9631537
fr fra french French 95.22884882
gl glg galician Galician 96.3053856
de deu german German 90.39729092
-- got gothic Gothic 93.85420706
el gre/ell greek Greek 96.85956246
he heb hebrew Hebrew 93.5171585
hi hin hindi Hindi 95.02399097
hu hun hungarian Hungarian 88.68949233
id ind indonesian Indonesian 90.74702886
ga gle irish Irish 90.60455378
it ita italian Italian 96.48434167
kk kaz kazakh Kazakh 79.22077922
la lat latin Latin 90.39735099
la-2 lat-2 latin-2 Latin-ITTB 98.24373855
la-3 lat-3 latin-3 Latin-PROIEL 95.78693144
lv lav latvian Latvian 86.34880803
no nor norwegian Norwegian 94.60278351
cu chu old_church_slavonic Old_Church_Slavonic 94.62492617
fa fas persian Persian 95.99826281
pl pol polish Polish 94.0848991
pt por portuguese Portuguese 95.08144363
pt-2 por-2 portuguese-2 Portuguese-BR 95.08798152
ro ron romanian Romanian 94.51972789
ru rus russian Russian-SynTagRus 97.65354521
sl slv slovenian Slovenian 94.02687904
sl-2 slv-2 slovenian-2 Slovenian-SST 91.15554049
es spa spanish Spanish 95.12795276
es-2 spa-2 spanish-2 Spanish-AnCora 96.78868917
sv swe swedish Swedish 94.39564215
sv-2 swe-2 swedish-2 Swedish-LinES 94.47010209
ta tam tamil Tamil 82.08886853
tr tur turkish Turkish 91.92623412

Technical details

RippleTagger is a slimmed down version of RDRPOSTagger.

The general architecture and experimental results of RDRPOSTagger can be found in our following papers:

Citations

Please cite either the EACL or the AICom paper whenever RDRPOSTagger is used to produce published results or incorporated into other software.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].