All Projects → natasha → naeval

natasha / naeval

Licence: MIT license
Comparing quality and performance of NLP systems for Russian language

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to naeval

Simpleeval
Simple Safe Sandboxed Extensible Expression Evaluator for Python
Stars: ✭ 246 (+547.37%)
Mutual labels:  evaluation
SQLCallStackResolver
Utility to resolve SQL Server callstacks to their correct symbolic form using just PDBs and without a dump file
Stars: ✭ 55 (+44.74%)
Mutual labels:  performance-analysis
lighthouse-chromium-alpine-docker
Run Google's Lighthouse headless in the background
Stars: ✭ 16 (-57.89%)
Mutual labels:  performance-analysis
doc
QuickPerf documentation: https://github.com/quick-perf/doc/wiki/QuickPerf
Stars: ✭ 22 (-42.11%)
Mutual labels:  performance-analysis
Strata
Раскладка клавиатуры для тех, кто любит Markdown и пишет по-русски
Stars: ✭ 70 (+84.21%)
Mutual labels:  russian
fias
Ruby wrapper for the Russian FIAS database (Федеральная Информационная Адресная Система)
Stars: ✭ 82 (+115.79%)
Mutual labels:  russian
Errant
ERRor ANnotation Toolkit: Automatically extract and classify grammatical errors in parallel original and corrected sentences.
Stars: ✭ 208 (+447.37%)
Mutual labels:  evaluation
rsmorphy
Morphological analyzer / inflection engine for Russian and Ukrainian languages rewritten in Rust
Stars: ✭ 27 (-28.95%)
Mutual labels:  russian
powa-archivist
powa-archivist: the powa PostgreSQL extension
Stars: ✭ 48 (+26.32%)
Mutual labels:  performance-analysis
neuro-comma
🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺
Stars: ✭ 46 (+21.05%)
Mutual labels:  russian
ember-appmetrics
Ember library used to measure various metrics in your Ember app with ultra simple APIs.
Stars: ✭ 16 (-57.89%)
Mutual labels:  performance-analysis
meval-rs
Math expression parser and evaluation library for Rust
Stars: ✭ 118 (+210.53%)
Mutual labels:  evaluation
ru-dalle
Generate images from texts. In Russian
Stars: ✭ 1,606 (+4126.32%)
Mutual labels:  russian
Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+7139.47%)
Mutual labels:  evaluation
tg2019task
TextGraphs-13 Shared Task on Multi-Hop Inference Explanation Regeneration
Stars: ✭ 42 (+10.53%)
Mutual labels:  evaluation
Klipse
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
Stars: ✭ 2,841 (+7376.32%)
Mutual labels:  evaluation
learnrxjs
Русскоязычная документация RxJS
Stars: ✭ 20 (-47.37%)
Mutual labels:  russian
vim-plugin-ruscmd
Vim plugin: support command mode in Russian keyboard layout
Stars: ✭ 60 (+57.89%)
Mutual labels:  russian
iuliia-go
Transliterate Cyrillic → Latin in every possible way
Stars: ✭ 36 (-5.26%)
Mutual labels:  russian
CrowdTruth
Version 1.0 of the CrowdTruth Framework for crowdsourcing ground truth data, for training and evaluation of cognitive computing systems. Check out also version 2.0 at https://github.com/CrowdTruth/CrowdTruth-core. Data collected with CrowdTruth methodology: http://data.crowdtruth.org/. Our papers: http://crowdtruth.org/papers/
Stars: ✭ 62 (+63.16%)
Mutual labels:  evaluation

CI

Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate project Natasha components: Razdel, Navec, Slovnet.

Install

Naeval supports Python 3.7+

$ pip install naeval

Documentation

Materials are in Russian:

Models

Model Tags Description
DeepPavlov NER # ner BiLSTM-CRF NER trained on Collection5. Original repo, docs, paper
DeepPavlov BERT NER # ner Current SOTA for Russian language. Docs, video
DeepPavlov Slavic BERT NER # ner DeepPavlov solution for BSNLP-2019. Paper
DeepPavlov Morph # morph Docs
DeepPavlov BERT Morph # morph Docs
DeepPavlov BERT Syntax # syntax BERT + biaffine head. Docs
Slovnet NER # ner
Slovnet BERT NER # ner
Slovnet Morph # morph
Slovnet BERT Morph # morph
Slovnet Syntax # syntax
Slovnet BERT Syntax # syntax
PullEnti # ner morph First place on factRuEval-2016, super sophisticated ruled based system
Stanza # ner morph syntax Tool by Stanford NLP released in 2020. Paper
SpaCy # token sent ner morph syntax Uses Russian models trained by @buriy
Texterra # morph syntax ner token sent Multifunctional NLP solution by ISP RAS
Tomita # ner GLR-parser by Yandex, only implementation for person names is publicly available
MITIE # ner Engine developed at MIT + third party model for Russian language
RuPosTagger # morph CRF tagger, part of Solarix project
RNNMorph # morph First place solution on morphoRuEval-2017. Post on Habr
Maru # morph
UDPipe # morph syntax Model trained on SynTagRus
NLTK # token sent Multifunctional library, provides model for Russian text segmentation. Docs
MyStem # token morph Wrapper for Yandex morphological analyzers
Moses # token sent Wrapper for Perl Moses utils
SegTok # token sent
RuTokenizer # token
Razdel # token sent
Spacy Russian Tokenizer # token sent Spacy segmentation pipeline for Russian texts by @aatimofeev
RuSentTokenizer # sent DeepPavlov sentence segmentation

Tokenization

See Razdel evalualtion section for more info.

corpora syntag gicrya rnc
errors time errors time errors time errors time
re.findall(\w+|\d+|\p+) 24 0.5 16 0.5 19 0.4 60 0.4
spacy 26 6.2 13 5.8 14 4.1 32 3.9
nltk.word_tokenize 60 3.4 256 3.3 75 2.7 199 2.9
mystem 23 5.0 15 4.7 19 3.7 14 3.9
mosestokenizer 11 2.1 8 1.9 15 1.6 16 1.7
segtok.word_tokenize 16 2.3 8 2.3 14 1.8 9 1.8
aatimofeev/spacy_russian_tokenizer 17 48.7 4 51.1 5 39.5 20 52.2
koziev/rutokenizer 15 1.1 8 1.0 23 0.8 68 0.9
razdel.tokenize 9 2.9 9 2.8 3 2.0 16 2.2

Sentence segmentation

corpora syntag gicrya rnc
errors time errors time errors time errors time
re.split([.?!…]) 114 0.9 53 0.6 63 0.7 130 1.0
segtok.split_single 106 17.8 36 13.4 1001 1.1 912 2.8
mosestokenizer 238 8.9 182 5.7 80 6.4 287 7.4
nltk.sent_tokenize 92 10.1 36 5.3 44 5.6 183 8.9
deeppavlov/rusenttokenize 57 10.9 10 7.9 56 6.8 119 7.0
razdel.sentenize 52 6.1 7 3.9 72 4.5 59 7.5

Pretrained embeddings

See Navec evalualtion section for more info.

type init, s get, µs disk, mb ram, mb vocab
hudlit_12B_500K_300d_100q navec 1.1 21.6 50.6 95.3 500K
news_1B_250K_300d_100q navec 0.8 20.7 25.4 47.7 250K
ruscorpora_upos_cbow_300_20_2019 w2v 3.3 1.4 220.6 236.1 189K
ruwikiruscorpora_upos_skipgram_300_2_2019 w2v 5.0 1.5 290.0 309.4 248K
tayga_upos_skipgram_300_2_2019 w2v 5.2 1.4 290.7 310.9 249K
tayga_none_fasttextcbow_300_10_2019 fasttext 8.0 13.4 2741.9 2746.9 192K
araneum_none_fasttextcbow_300_5_2018 fasttext 16.4 10.6 2752.1 2754.7 195K
type simlex hj rt ae ae2 lrwc
hudlit_12B_500K_300d_100q navec 0.310 0.707 0.842 0.931 0.923 0.604
news_1B_250K_300d_100q navec 0.230 0.590 0.784 0.866 0.861 0.589
ruscorpora_upos_cbow_300_20_2019 w2v 0.359 0.685 0.852 0.758 0.896 0.602
ruwikiruscorpora_upos_skipgram_300_2_2019 w2v 0.321 0.723 0.817 0.801 0.860 0.629
tayga_upos_skipgram_300_2_2019 w2v 0.429 0.749 0.871 0.771 0.899 0.639
tayga_none_fasttextcbow_300_10_2019 fasttext 0.369 0.639 0.793 0.682 0.813 0.536
araneum_none_fasttextcbow_300_5_2018 fasttext 0.349 0.671 0.801 0.706 0.793 0.579

Morphology taggers

See Slovnet evaluation section for more info.

news wiki fiction social poetry
slovnet 0.961 0.815 0.905 0.807 0.664
slovnet_bert 0.982 0.884 0.990 0.890 0.856
deeppavlov 0.940 0.841 0.944 0.870 0.857
deeppavlov_bert 0.951 0.868 0.964 0.892 0.865
udpipe 0.918 0.811 0.957 0.870 0.776
spacy 0.964 0.849 0.942 0.857 0.784
stanza 0.934 0.831 0.940 0.873 0.825
rnnmorph 0.896 0.812 0.890 0.860 0.838
maru 0.894 0.808 0.887 0.861 0.840
rupostagger 0.673 0.645 0.661 0.641 0.636
init, s disk, mb ram, mb speed, it/s
slovnet 1.0 27 115 532.0
slovnet_bert 5.0 475 8087 285.0 (gpu)
deeppavlov 4.0 32 10240 90.0 (gpu)
deeppavlov_bert 20.0 1393 8704 85.0 (gpu)
udpipe 6.9 45 242 56.2
spacy 8.0 140 579 50.0
stanza 2.0 591 393 92.0
rnnmorph 8.7 10 289 16.6
maru 15.8 44 370 36.4
rupostagger 4.8 3 118 48.0

Syntax parser

news wiki fiction social poetry
uas las uas las uas las uas las uas las
slovnet 0.907 0.880 0.775 0.718 0.806 0.776 0.726 0.656 0.542 0.469
slovnet_bert 0.965 0.936 0.891 0.828 0.958 0.940 0.846 0.782 0.776 0.706
deeppavlov_bert 0.962 0.910 0.882 0.786 0.963 0.929 0.844 0.761 0.784 0.691
udpipe 0.873 0.823 0.622 0.531 0.910 0.876 0.700 0.624 0.625 0.534
spacy 0.943 0.916 0.851 0.783 0.901 0.874 0.804 0.737 0.704 0.616
stanza 0.940 0.886 0.815 0.716 0.936 0.895 0.802 0.714 0.713 0.613
init, s disk, mb ram, mb speed, it/s
slovnet 1.0 27 125 450.0
slovnet_bert 5.0 504 3427 200.0 (gpu)
deeppavlov_bert 34.0 1427 8704 75.0 (gpu)
udpipe 6.9 45 242 56.2
spacy 9.0 140 579 41.0
stanza 3.0 591 890 12.0

NER

See Slovnet evalualtion section for more info.

factru gareev ne5 bsnlp
f1 PER LOC ORG PER ORG PER LOC ORG PER LOC ORG
slovnet 0.959 0.915 0.825 0.977 0.899 0.984 0.973 0.951 0.944 0.834 0.718
slovnet_bert 0.973 0.928 0.831 0.991 0.911 0.996 0.989 0.976 0.960 0.838 0.733
deeppavlov 0.910 0.886 0.742 0.944 0.798 0.942 0.919 0.881 0.866 0.767 0.624
deeppavlov_bert 0.971 0.928 0.825 0.980 0.916 0.997 0.990 0.976 0.954 0.840 0.741
deeppavlov_slavic 0.956 0.884 0.714 0.976 0.776 0.984 0.817 0.761 0.965 0.925 0.831
pullenti 0.905 0.814 0.686 0.939 0.639 0.952 0.862 0.683 0.900 0.769 0.566
spacy 0.901 0.886 0.765 0.970 0.883 0.967 0.928 0.918 0.919 0.823 0.693
stanza 0.943 0.865 0.687 0.953 0.827 0.923 0.753 0.734 0.938 0.838 0.724
texterra 0.900 0.800 0.597 0.888 0.561 0.901 0.777 0.594 0.858 0.783 0.548
tomita 0.929 0.921 0.945 0.881
mitie 0.888 0.861 0.532 0.849 0.452 0.753 0.642 0.432 0.736 0.801 0.524
init, s disk, mb ram, mb speed, it/s
slovnet 1.0 27 205 25.3
slovnet_bert 5.0 473 9500 40.0 (gpu)
deeppavlov 5.9 1024 3072 24.3 (gpu)
deeppavlov_bert 34.5 2048 6144 13.1 (gpu)
deeppavlov_slavic 35.0 2048 4096 8.0 (gpu)
pullenti 2.9 16 253 6.0
spacy 8.0 140 625 8.0
stanza 3.0 591 11264 3.0 (gpu)
texterra 47.6 193 3379 4.0
tomita 2.0 64 63 29.8
mitie 28.3 327 261 32.8

Support

Development

Dev env

python -m venv ~/.venvs/natasha-naeval
source ~/.venvs/natasha-naeval/bin/activate

pip install -r requirements/dev.txt
pip install -e .

python -m ipykernel install --user --name natasha-naeval

Lint

make lint
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].