IlyaGusev / Rnnmorph
Licence: apache-2.0
Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.
Stars: ✭ 111
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Rnnmorph
libmorph
libmorph rus/ukr - fast & accurate morphological analyzer/analyses for Russian and Ukrainian
Stars: ✭ 16 (-85.59%)
Mutual labels: russian, morphological-analysis
Pymystem3
A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggestion, please make a pull request. We are very open to accepting any contributions.
Stars: ✭ 224 (+101.8%)
Mutual labels: russian, morphological-analysis
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-86.49%)
Mutual labels: russian, morphological-analysis
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (-24.32%)
Mutual labels: morphological-analysis
Guide2011 3
Гайд по сокету LGA2011-3 и в целом по платформе Intel X99
Stars: ✭ 37 (-66.67%)
Mutual labels: russian
Mldm
потоковый курс "Машинное обучение и анализ данных (Machine Learning and Data Mining)" на факультете ВМК МГУ имени М.В. Ломоносова
Stars: ✭ 35 (-68.47%)
Mutual labels: russian
Yoptascript
Скриптовый язык программирования для гопников и реальных пацанов
Stars: ✭ 1,315 (+1084.68%)
Mutual labels: russian
Russian news corpus
Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ
Stars: ✭ 76 (-31.53%)
Mutual labels: russian
Owasp Masvs
The Mobile Application Security Verification Standard (MASVS) is a standard for mobile app security.
Stars: ✭ 1,030 (+827.93%)
Mutual labels: russian
React Ru Interview Questions
Здесь собраны самые популярные вопросы, задаваемые на русскоязычных собеседованиях разработчика React.js, и ответы на них. Тематика вопросов включает в себя как основы JavaScript и веб-технологий так и глубокое понимание работы React.js
Stars: ✭ 86 (-22.52%)
Mutual labels: russian
Daily Hero
The bot that sends daily closed issues digest to our team
Stars: ✭ 38 (-65.77%)
Mutual labels: russian
Retina Features
Project for segmentation of blood vessels, microaneurysm and hardexudates in fundus images.
Stars: ✭ 95 (-14.41%)
Mutual labels: morphological-analysis
Rustycrate.ru
Русскоязычный сайт о языке программирования Rust
Stars: ✭ 72 (-35.14%)
Mutual labels: russian
Russian Roulette
🍀 You want to push your luck? ... Go ahead and try your best with this CLI russian roulette! 💥
Stars: ✭ 92 (-17.12%)
Mutual labels: russian
Redux React I18n
An i18n solution for React/Redux and React Native projects
Stars: ✭ 64 (-42.34%)
Mutual labels: russian
rnnmorph
Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).
Russian language, MorphoRuEval-2017 test dataset, accuracy
Domain | Full tag | PoS tag | F.t. + lemma | Sentence f.t. | Sentence f.t.l. |
---|---|---|---|---|---|
Lenta (news) | 96.31% | 98.01% | 92.96% | 77.93% | 52.79% |
VK (social) | 95.20% | 98.04% | 92.06% | 74.30% | 60.56% |
JZ (lit.) | 95.87% | 98.71% | 90.45% | 73.10% | 43.15% |
All | 95.81% | 98.26% | N/A | 74.92% | N/A |
English language, UD EWT test, accuracy
Dataset | Full tag | PoS tag | F.t. + lemma | Sentence f.t. | Sentence f.t.l. |
---|---|---|---|---|---|
UD EWT test | 91.57% | 94.10% | 87.02% | 63.17% | 50.99% |
Speed and memory consumption
Speed: from 200 to 600 words per second using CPU.
Memory consumption: about 500-600 MB for single-sentence predictions
Install
sudo pip3 install rnnmorph
Usage
from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]
Training
Acknowledgements
- Anastasyev D. G., Gusev I. O., Indenbom E. M., 2018, Improving Part-of-speech Tagging Via Multi-task Learning and Character-level Word Representations
- Anastasyev D. G., Andrianov A. I., Indenbom E. M., 2017, Part-of-speech Tagging with Rich Language Description, презентация
- Дорожка по морфологическому анализу "Диалога-2017"
- Материалы дорожки
- Morphine by kmike, CRF classifier for MorphoRuEval-2017 by kmike
- Universal Dependencies
- Tobias Horsmann and Torsten Zesch, 2017, Do LSTMs really work so well for PoS tagging? – A replication study
- Barbara Plank, Anders Søgaard, Yoav Goldberg, 2016, Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].