All Projects → tokenmill → Snowball

tokenmill / Snowball

Licence: other
Snowball version of the Porter stemmer for the Lithuanian language.

Projects that are alternatives of or similar to Snowball

Php Stemmer
Native PHP Stemmer
Stars: ✭ 84 (+1580%)
Mutual labels:  stemmer
stemmify
Ruby module that converts a word to its approximate root form with the Porter stemmer. For example, observing and observation reduce to observ.
Stars: ✭ 54 (+980%)
Mutual labels:  stemmer
gwizo
Simple Go implementation of the Porter Stemmer algorithm with powerful features.
Stars: ✭ 26 (+420%)
Mutual labels:  stemmer
Arabicstemmer
Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.
Stars: ✭ 102 (+1940%)
Mutual labels:  stemmer
lara-hungarian-nlp
NLP class for rapid ChatBot development in Hungarian language
Stars: ✭ 27 (+440%)
Mutual labels:  stemmer
lorca
Natural Language Processing for Spanish in Node.js. Stemmer, sentiment analysis, readability, tf-idf with batteries, concordance and more!
Stars: ✭ 95 (+1800%)
Mutual labels:  stemmer
Kelime kok ayirici
Derin Öğrenme Tabanlı - seq2seq - Türkçe için kelime kökü bulma web uygulaması - Turkish Stemmer (tr_stemmer)
Stars: ✭ 76 (+1420%)
Mutual labels:  stemmer
Awesome Persian Nlp Ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+9100%)
Mutual labels:  stemmer
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (+1920%)
Mutual labels:  stemmer
CISTEM
Stemmer for German
Stars: ✭ 33 (+560%)
Mutual labels:  stemmer
Stemmer
An English (Porter2) stemming implementation in Elixir.
Stars: ✭ 134 (+2580%)
Mutual labels:  stemmer
sastrawijs
Indonesian language stemmer. Javascript port of PHP Sastrawi project.
Stars: ✭ 30 (+500%)
Mutual labels:  stemmer
PersianStemmer-Python
PersianStemmer-Python
Stars: ✭ 43 (+760%)
Mutual labels:  stemmer
Stemmer
Fast Porter stemmer implementation
Stars: ✭ 86 (+1620%)
Mutual labels:  stemmer
Ruby Stemmer
Expose libstemmer_c to Ruby
Stars: ✭ 254 (+4980%)
Mutual labels:  stemmer
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (+1580%)
Mutual labels:  stemmer
perstem
Persian stemmer and morphological analyzer
Stars: ✭ 18 (+260%)
Mutual labels:  stemmer
Word forms
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
Stars: ✭ 463 (+9160%)
Mutual labels:  stemmer
Lunr Languages
A collection of languages stemmers and stopwords for Lunr Javascript library
Stars: ✭ 296 (+5820%)
Mutual labels:  stemmer
lancaster-stemmer
Lancaster stemming algorithm
Stars: ✭ 22 (+340%)
Mutual labels:  stemmer

snowball

Old version of Snowball version of Porter stemmer for Lithuanian language is in the file lithuanian.sbl.

New version is in the file conservative.sbl.

The difference between the new and old versions is that the new one is less aggressive. This means that there should be fewer words that are overstemmed.

The new stemmer was created with search applications in mind. Therefore, nouns are considered as more important then adjectives, verbs, etc. This means that some suffixes, such as -ut- like in 'kalakutas', are left untouched during stemming. On the other hand, this leaves some adjectives understemmed, e.g. 'sveikutis -> sveikut'. There will always be trade-offs.

NOTE:

Current stemmer version uses length of the string to prevent overstemming. Stemmer created with snowball* program extends org.tartarus.snowball.SnowballProgram class and gets length of the current string using Java's current.length() call.

Whereas Lucene 4.10.1 implements SnowballProgram in such a way that attribute current is private, therefore current.length() doesn't compile for Lucene. Workaround is to substitute current.length() with getCurrent().length() on line 589.

  • snowball program was downloaded from here.

License

Copyright © 2019 TokenMill UAB.

Distributed under the The Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].