All Projects → alexeyev → mystem-scala

alexeyev / mystem-scala

Licence: MIT license
Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to mystem-scala

ArabicProcessingCog
A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.
Stars: ✭ 19 (-9.52%)
Mutual labels:  tokenizer, computational-linguistics
mystem
CGo bindings to Yandex.Mystem
Stars: ✭ 28 (+33.33%)
Mutual labels:  russian-specific, mystem
libmorph
libmorph rus/ukr - fast & accurate morphological analyzer/analyses for Russian and Ukrainian
Stars: ✭ 16 (-23.81%)
Mutual labels:  lemmatizer, russian-morphology
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (+52.38%)
Mutual labels:  tokenizer, lemmatizer
RussianNounsJS
Склонение существительных по падежам. Обычно требуются только форма в именительном падеже, одушевлённость и род.
Stars: ✭ 29 (+38.1%)
Mutual labels:  russian-specific, russian-morphology
GrammarEngine
Грамматический Словарь Русского Языка (+ английский, японский, etc)
Stars: ✭ 68 (+223.81%)
Mutual labels:  lemmatizer, russian-morphology
jargon
Tokenizers and lemmatizers for Go
Stars: ✭ 98 (+366.67%)
Mutual labels:  tokenizer, lemmatizer
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-38.1%)
Mutual labels:  computational-linguistics
lemma
A Morphological Parser (Analyser) / Lemmatizer written in Elixir.
Stars: ✭ 45 (+114.29%)
Mutual labels:  lemmatizer
YandexAlgorithms
Lecture notes, Code with comments.
Stars: ✭ 30 (+42.86%)
Mutual labels:  yandex
Yandex.Music.Api
Client Yandex.Music.Api for Yandex.Music
Stars: ✭ 53 (+152.38%)
Mutual labels:  yandex
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-4.76%)
Mutual labels:  computational-linguistics
yandex-dialogs-php-sdk
PHP-библиотека для облегчения работы с диалогами от Яндекс
Stars: ✭ 23 (+9.52%)
Mutual labels:  yandex
appmetrica-logsapi-loader
A tool for automatic data loading from AppMetrica LogsAPI into (local) ClickHouse
Stars: ✭ 18 (-14.29%)
Mutual labels:  yandex
artefactory-connectors-kit
ACK is an E(T)L tool specialized in API data ingestion. It is accessible through a Command-Line Interface. The application allows you to easily extract, stream and load data (with minimum transformations), from the API source to the destination of your choice.
Stars: ✭ 34 (+61.9%)
Mutual labels:  yandex
wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+142.86%)
Mutual labels:  tokenizer
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+142.86%)
Mutual labels:  computational-linguistics
tokenizer
Tokenize CSS according to the CSS Syntax
Stars: ✭ 52 (+147.62%)
Mutual labels:  tokenizer
wink-lemmatizer
English lemmatizer
Stars: ✭ 53 (+152.38%)
Mutual labels:  lemmatizer
yandex-translate-api
A simple REST client library for Yandex.Translate
Stars: ✭ 29 (+38.1%)
Mutual labels:  yandex

A Scala wrapper for morphological analyzer Yandex.MyStem

Introduction

Details about the algorithm can be found in I. Segalovich «A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine», MLMTA-2003, Las Vegas, Nevada, USA.

The wrapper's code in under MIT license, but please remember that Yandex.MyStem is not open source and licensed under conditions of the Yandex License.

System Requirements

The wrapper should at least work on Ubuntu Linux 12.04+, Windows 7+ (+ people say it also works on OS X).

Install

Maven

Maven central

<dependency>
  <groupId>ru.stachek66.nlp</groupId>
  <artifactId>mystem-scala</artifactId>
  <version>0.1.6</version>
</dependency>

Issues

Only mystem 3.{0,1} are supported currently. Please create issues for compatibility troubles and other requests.

Examples

Probably the most important thing to remember when working with mystem-scala is that you should have just one MyStem instance per mystem/mystem.exe file in your application.

Scala

import java.io.File

import ru.stachek66.nlp.mystem.holding.{Factory, MyStem, Request}

object MystemSingletonScala {

  val mystemAnalyzer: MyStem =
    new Factory("-igd --eng-gr --format json --weight")
      .newMyStem(
        "3.0",
        Option(new File("/home/coolguy/coolproject/3dparty/mystem"))).get()
}

object AppExampleScala extends App {

  MystemSingletonScala
    .mystemAnalyzer
    .analyze(Request("Есть большие пассажиры мандариновой травы"))
    .info
    .foreach(info => println(info.initial + " -> " + info.lex))
}

Java

import ru.stachek66.nlp.mystem.holding.Factory;
import ru.stachek66.nlp.mystem.holding.MyStem;
import ru.stachek66.nlp.mystem.holding.MyStemApplicationException;
import ru.stachek66.nlp.mystem.holding.Request;
import ru.stachek66.nlp.mystem.model.Info;
import scala.Option;
import scala.collection.JavaConversions;

import java.io.File;

public class MyStemJavaExample {

    private final static MyStem mystemAnalyzer =
            new Factory("-igd --eng-gr --format json --weight")
                    .newMyStem("3.0", Option.<File>empty()).get();

    public static void main(final String[] args) throws MyStemApplicationException {

        final Iterable<Info> result =
                JavaConversions.asJavaIterable(
                        mystemAnalyzer
                                .analyze(Request.apply("И вырвал грешный мой язык"))
                                .info()
                                .toIterable());

        for (final Info info : result) {
            System.out.println(info.initial() + " -> " + info.lex() + " | " + info.rawResponse());
        }
    }
}

How to Cite

The references to this repository are highly appreciated, if you use our work.

@misc{alekseev2018mystemscala, 
    author = {Anton Alekseev}, 
    title = {mystem-scala}, 
    year = {2018}, 
    publisher = {GitHub}, 
    journal = {GitHub repository}, 
    howpublished = {\url{https://github.com/alexeyev/mystem-scala/}}, 
    commit = {the latest commit of the codebase you have used}
}

If you do cite it, please do not forget to cite the original algorithm's author's paper as well.

Contacts

Anton Alekseev [email protected]

Thanks for reviews, reports and contributions

  • Vladislav Dolbilov, @darl
  • Mikhail Malchevsky
  • @anton-shirikov
  • Filipp Malkovsky
  • @dizzy7

Also please see

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].