All Projects → nlpub → Pymystem3

nlpub / Pymystem3

Licence: other
A Python wrapper of the Yandex Mystem 3.1 morphological analyzer (http://api.yandex.ru/mystem). The original tool is shipped as a binary and this library makes it easy to integrate it in Python projects. Let us know in the issues if you would like to be involved into the developments or maintenance of this project. If you have any fix or suggestion, please make a pull request. We are very open to accepting any contributions.

Programming Languages

python
139335 projects - #7 most used programming language
language
365 projects

Projects that are alternatives of or similar to Pymystem3

udar
UDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-93.3%)
Mutual labels:  russian, morphological-analysis
aot
Russian morphology for Java
Stars: ✭ 41 (-81.7%)
Mutual labels:  russian, morphological-analysis
Rdrpostagger
R package for Ripple Down Rules-based Part-Of-Speech Tagging (RDRPOS). On more than 45 languages.
Stars: ✭ 31 (-86.16%)
Mutual labels:  pos, tagging
libmorph
libmorph rus/ukr - fast & accurate morphological analyzer/analyses for Russian and Ukrainian
Stars: ✭ 16 (-92.86%)
Mutual labels:  russian, morphological-analysis
Rnnmorph
Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.
Stars: ✭ 111 (-50.45%)
Mutual labels:  russian, morphological-analysis
Rust S3
Rust library for interfacing with AWS S3 and other API compatible services
Stars: ✭ 177 (-20.98%)
Mutual labels:  yandex
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (-9.37%)
Mutual labels:  pos
Interpy Ru
Intermediate Python book Russian translation
Stars: ✭ 175 (-21.87%)
Mutual labels:  russian
Russian Words
List of Russian words
Stars: ✭ 168 (-25%)
Mutual labels:  russian
Py Bitcoin
比特币的python简易实现,主要介绍比特币底层技术。如Base58编码、椭圆加密算法、MerkleTree、P2P对等网络、RPC通信、UTXO、虚拟机、DHT、DAG、链上数据的持久化存储等。
Stars: ✭ 217 (-3.12%)
Mutual labels:  pos
Momepy
Urban Morphology Measuring Toolkit
Stars: ✭ 210 (-6.25%)
Mutual labels:  morphological-analysis
Multi Tacotron Voice Cloning
Phoneme multilingual(Russian-English) voice cloning based on
Stars: ✭ 192 (-14.29%)
Mutual labels:  russian
Deeptoxic
top 1% solution to toxic comment classification challenge on Kaggle.
Stars: ✭ 180 (-19.64%)
Mutual labels:  pos
Supertag
A tag-based filesystem
Stars: ✭ 207 (-7.59%)
Mutual labels:  tagging
Shopyo
🎁 Your Open web framework, designed with big in mind. Flask with Django advantages. Build your management systems, ERP products & mobile backend (coming soon). Small business needs apps included by default. First timers friendly. Email: [email protected] | password: pass
Stars: ✭ 172 (-23.21%)
Mutual labels:  pos
Yargy
Rule-based facts extraction for Russian language
Stars: ✭ 216 (-3.57%)
Mutual labels:  russian
Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
Stars: ✭ 2,368 (+957.14%)
Mutual labels:  pos
Rust book ru
The Rust Programming Language на русском языке
Stars: ✭ 188 (-16.07%)
Mutual labels:  russian
Cachingframework.redis
Distributed caching based on StackExchange.Redis and Redis. Includes support for tagging and is cluster-compatible.
Stars: ✭ 209 (-6.7%)
Mutual labels:  tagging
Selectize.js
Selectize is the hybrid of a textbox and <select> box. It's jQuery based, and it has autocomplete and native-feeling keyboard navigation; useful for tagging, contact lists, etc.
Stars: ✭ 12,744 (+5589.29%)
Mutual labels:  tagging

================================================================== A Python wrapper of the Yandex Mystem 3.1 morphological analyzer

.. image:: https://travis-ci.org/nlpub/pymystem3.png?branch=master :target: http://travis-ci.org/nlpub/pymystem3 :alt: Build Status

Introduction

This module contains a wrapper for an excellent morphological analyzer for Russian language Yandex Mystem 3.1 <https://tech.yandex.ru/mystem/>_ released in June 2014. A morphological analyzer can perform lemmatization of text and derive a set of morphological attributes for each token. For more details about the algorithm see I. Segalovich «A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine» <http://download.yandex.ru/company/iseg-las-vegas.pdf>_, MLMTA-2003, Las Vegas, Nevada, USA.

Python is the language of choice for many computational linguists, including those working with Russian language. The main motivation for this development was absence of any Python wrapper for the Mystem, a one of the most popular morphological analyzers for Russian language along with the PyMorphy2 <https://github.com/kmike/pymorphy2>, the TreeTagger <http://corpus.leeds.ac.uk/mocky/> and AOT <http://www.aot.ru/download.php>_.

The third version of Mystem introduces several importaint improvements, most importaintly part-of-speech disambiguation. Our wrapper runs the Mystem in the mode which performs POS disambiguation.

This wrapper is open sources under MIT license. However, please consider that the Yandex Mystem is not open source and licensed under conditions of the Yandex License <http://legal.yandex.ru/mystem/>_.

System Requrements

The wrapper works with CPython 2.6+/3.3+ and PyPy 1.9+.

The wrapper was tested on Ubuntu Linux 12.04+, Mac OSX 10.9+ and Windows 7+.

For 32bit architectures and freebsd platform support use ver. 0.1.10.

Installation

  1. Stable version: https://pypi.python.org/pypi/pymystem3. You can install it using pip::

    pip install pymystem3

.. * Documentation: http://pythonhosted.org/pymystem3

  1. Latest version (recommended): https://github.com/nlpub/pymystem3::

    pip install git+https://github.com/nlpub/pymystem3

A Quick Example

Lemmatization

::

>>> from pymystem3 import Mystem
>>> text = "Красивая мама красиво мыла раму"
>>> m = Mystem()
>>> lemmas = m.lemmatize(text)
>>> print(''.join(lemmas))
красивый мама красиво мыть рама

Getting grammatical information and lemmas.

::

import json
from pymystem3 import Mystem

text = "Красивая мама красиво мыла раму"
m = Mystem()
lemmas = m.lemmatize(text)

print ("lemmas:", ''.join(lemmas))
print ("full info:", json.dumps(m.analyze(text), ensure_ascii=False))

lemmas: красивый мама красиво мыть рама

full info: [{"text": "Красивая", "analysis": [{"lex": "красивый", "gr": "A=им,ед,полн,жен"}]}, {"text": " "}, {"text": "мама", "analysis": [{"lex": "мама", "gr": "S,жен,од=им,ед"}]}, {"text": " "}, {"text": "красиво", "analysis": [{"lex": "красиво", "gr": "ADV="}]}, {"text": " "}, {"text": "мыла", "analysis": [{"lex": "мыть", "gr": "V,несов,пе=прош,ед,изъяв,жен"}]}, {"text": " "}, {"text": "раму", "analysis": [{"lex": "рама", "gr": "S,жен,неод=вин,ед"}]}, {"text": "\n"}]

Issues

Please report any bugs or requests that you have using the GitHub issue tracker (https://github.com/nlpub/pymystem3/issues)! We have only very limited amount of resources to maintain this project: please propose a pull request directly if you see an obvious way of fixing the issue. We are very open to accepting bug fixes and your help is greatly appreciated.

Authors

The full list of contributors is listed by Github. You can also contact the original contributors of the project via email:

  • Denis Sukhonin (d.sukhonin): development
  • Alexander Panchenko (panchenko.alexander): conception

@ gmail

If you are interested in further developments or becoming a maintainter of this project please drop us an email: your help is greatly appreciated.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].