All Projects → IlyaGusev → Rnnmorph

IlyaGusev / Rnnmorph

Licence: apache-2.0
Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rnnmorph

libmorph
libmorph rus/ukr - fast & accurate morphological analyzer/analyses for Russian and Ukrainian
Stars: ✭ 16 (-85.59%)
Mutual labels:  russian, morphological-analysis
aot
Russian morphology for Java
Stars: ✭ 41 (-63.06%)
Mutual labels:  russian, morphological-analysis
Pymystem3
A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggestion, please make a pull request. We are very open to accepting any contributions.
Stars: ✭ 224 (+101.8%)
Mutual labels:  russian, morphological-analysis
udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-86.49%)
Mutual labels:  russian, morphological-analysis
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (-24.32%)
Mutual labels:  morphological-analysis
Pythonz
Место, где делают pythonz.net
Stars: ✭ 43 (-61.26%)
Mutual labels:  russian
Guide2011 3
Гайд по сокету LGA2011-3 и в целом по платформе Intel X99
Stars: ✭ 37 (-66.67%)
Mutual labels:  russian
Mldm
потоковый курс "Машинное обучение и анализ данных (Machine Learning and Data Mining)" на факультете ВМК МГУ имени М.В. Ломоносова
Stars: ✭ 35 (-68.47%)
Mutual labels:  russian
Kiwi
Kiwi(지능형 한국어 형태소 분석기)
Stars: ✭ 107 (-3.6%)
Mutual labels:  morphological-analysis
Yoptascript
Скриптовый язык программирования для гопников и реальных пацанов
Stars: ✭ 1,315 (+1084.68%)
Mutual labels:  russian
Russian news corpus
Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ
Stars: ✭ 76 (-31.53%)
Mutual labels:  russian
Owasp Masvs
The Mobile Application Security Verification Standard (MASVS) is a standard for mobile app security.
Stars: ✭ 1,030 (+827.93%)
Mutual labels:  russian
React Ru Interview Questions
Здесь собраны самые популярные вопросы, задаваемые на русскоязычных собеседованиях разработчика React.js, и ответы на них. Тематика вопросов включает в себя как основы JavaScript и веб-технологий так и глубокое понимание работы React.js
Stars: ✭ 86 (-22.52%)
Mutual labels:  russian
Daily Hero
The bot that sends daily closed issues digest to our team
Stars: ✭ 38 (-65.77%)
Mutual labels:  russian
Retina Features
Project for segmentation of blood vessels, microaneurysm and hardexudates in fundus images.
Stars: ✭ 95 (-14.41%)
Mutual labels:  morphological-analysis
Decliner
Decline russian words with Decliner
Stars: ✭ 36 (-67.57%)
Mutual labels:  russian
Rustycrate.ru
Русскоязычный сайт о языке программирования Rust
Stars: ✭ 72 (-35.14%)
Mutual labels:  russian
Russian Roulette
🍀 You want to push your luck? ... Go ahead and try your best with this CLI russian roulette! 💥
Stars: ✭ 92 (-17.12%)
Mutual labels:  russian
Zapret
Обход DPI в linux
Stars: ✭ 1,148 (+934.23%)
Mutual labels:  russian
Redux React I18n
An i18n solution for React/Redux and React Native projects
Stars: ✭ 64 (-42.34%)
Mutual labels:  russian

rnnmorph

Current version on PyPI Python versions Build Status Code Climate

Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).

Russian language, MorphoRuEval-2017 test dataset, accuracy

Domain Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
Lenta (news) 96.31% 98.01% 92.96% 77.93% 52.79%
VK (social) 95.20% 98.04% 92.06% 74.30% 60.56%
JZ (lit.) 95.87% 98.71% 90.45% 73.10% 43.15%
All 95.81% 98.26% N/A 74.92% N/A

English language, UD EWT test, accuracy

Dataset Full tag PoS tag F.t. + lemma Sentence f.t. Sentence f.t.l.
UD EWT test 91.57% 94.10% 87.02% 63.17% 50.99%

Speed and memory consumption

Speed: from 200 to 600 words per second using CPU.

Memory consumption: about 500-600 MB for single-sentence predictions

Install

sudo pip3 install rnnmorph

Usage

from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]

Training

Simple model training: Open In Colab

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].