All Projects → aurelian → Ruby Stemmer

aurelian / Ruby Stemmer

Licence: mit
Expose libstemmer_c to Ruby

Programming Languages

c
50402 projects - #5 most used programming language
ruby
36898 projects - #4 most used programming language

Labels

Projects that are alternatives of or similar to Ruby Stemmer

Arabic Light Stemmer
Arabic light stemmer. Light stemming for Arabic words removes prefixes and suffixes and normalizes words
Stars: ✭ 14 (-94.49%)
Mutual labels:  stemmer
Cadmium
Natural Language Processing (NLP) library for Crystal
Stars: ✭ 172 (-32.28%)
Mutual labels:  stemmer
lorca
Natural Language Processing for Spanish in Node.js. Stemmer, sentiment analysis, readability, tf-idf with batteries, concordance and more!
Stars: ✭ 95 (-62.6%)
Mutual labels:  stemmer
Kelime kok ayirici
Derin Öğrenme Tabanlı - seq2seq - Türkçe için kelime kökü bulma web uygulaması - Turkish Stemmer (tr_stemmer)
Stars: ✭ 76 (-70.08%)
Mutual labels:  stemmer
Arabicstemmer
Assem's Arabic Light Stemmer is a snowball-based stemming algorithm for Arabic aimed mainly to improve search.
Stars: ✭ 102 (-59.84%)
Mutual labels:  stemmer
lara-hungarian-nlp
NLP class for rapid ChatBot development in Hungarian language
Stars: ✭ 27 (-89.37%)
Mutual labels:  stemmer
Akarata
Indonesian stemmer - Pustaka JavaScript untuk mengambil kata dasar dari kata berimbuhan pada bahasa Indonesia.
Stars: ✭ 26 (-89.76%)
Mutual labels:  stemmer
CISTEM
Stemmer for German
Stars: ✭ 33 (-87.01%)
Mutual labels:  stemmer
Stemmer
An English (Porter2) stemming implementation in Elixir.
Stars: ✭ 134 (-47.24%)
Mutual labels:  stemmer
perstem
Persian stemmer and morphological analyzer
Stars: ✭ 18 (-92.91%)
Mutual labels:  stemmer
Qutuf
Qutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (-66.93%)
Mutual labels:  stemmer
Stemmer
Fast Porter stemmer implementation
Stars: ✭ 86 (-66.14%)
Mutual labels:  stemmer
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
Stars: ✭ 101 (-60.24%)
Mutual labels:  stemmer
Nlp Js Tools French
POS Tagger, lemmatizer and stemmer for french language in javascript
Stars: ✭ 32 (-87.4%)
Mutual labels:  stemmer
PersianStemmer-Python
PersianStemmer-Python
Stars: ✭ 43 (-83.07%)
Mutual labels:  stemmer
Ptstem
Stemming Algorithms for the Portuguese Language
Stars: ✭ 13 (-94.88%)
Mutual labels:  stemmer
sastrawijs
Indonesian language stemmer. Javascript port of PHP Sastrawi project.
Stars: ✭ 30 (-88.19%)
Mutual labels:  stemmer
gwizo
Simple Go implementation of the Porter Stemmer algorithm with powerful features.
Stars: ✭ 26 (-89.76%)
Mutual labels:  stemmer
lancaster-stemmer
Lancaster stemming algorithm
Stars: ✭ 22 (-91.34%)
Mutual labels:  stemmer
stemmify
Ruby module that converts a word to its approximate root form with the Porter stemmer. For example, observing and observation reduce to observ.
Stars: ✭ 54 (-78.74%)
Mutual labels:  stemmer

= Ruby-Stemmer

Ruby-Stemmer exposes SnowBall API to Ruby.

{Travis CI Status}[https://api.travis-ci.org/aurelian/ruby-stemmer.png]

This package includes libstemmer_c library released under BSD licence and available for free {here}[https://snowballstem.org/download.html].

Support for latin language is also included and it has been generated with the snowball compiler using {schinke contribution}[https://snowballstem.org/otherapps/schinke/].

For more details about libstemmer_c please visit the {SnowBall website}[https://snowballstem.org/].

== Usage

require 'rubygems' require 'lingua/stemmer'

stemmer= Lingua::Stemmer.new(:language => "ro") stemmer.stem("netăgăduit") #=> netăgădu

=== Alternative

require 'rubygems' require 'lingua/stemmer'

Lingua.stemmer( %w(incontestabil neîndoielnic), :language => "ro" ) #=> ["incontest", "neîndoieln"] Lingua.stemmer("installation") #=> "instal" Lingua.stemmer("installation", :language => "fr", :encoding => "ISO_8859_1") do | word | puts "~> #{word}" #=> "instal" end # => #Lingua::Stemmer:0x102501e48

=== Gemfile

gem 'ruby-stemmer', '>=2.0.0', :require => 'lingua/stemmer'

=== More details

  • Complete API in {RDoc format}[http://rdoc.info/github/aurelian/ruby-stemmer/master/frames]
  • More usage on the {test file}[https://github.com/aurelian/ruby-stemmer/blob/master/test/lingua/test_stemmer.rb]

== Install

gem install ruby-stemmer

==== Windows

There's also a Windows (Fat bin)

gem install ruby-stemmer --platform=x86-mingw32

As far as I know the above should work with {rubyinstaller}[http://rubyinstaller.org/]. If it fails, you could try with:

gem install ruby-stemmer --platform=x86-mswin32

{It's known}[https://cl.ly/BX9o] to work under Windows XP.

=== Development version

$ git clone git://github.com/aurelian/ruby-stemmer.git $ cd ruby-stemmer $ rake -T #<== see what we've got $ rake compile #<== builds the extension do'h $ rake test

==== Cross Compiling

Install {rake-compiler-dock}[https://github.com/rake-compiler/rake-compiler-dock] and follow the setup.

Then, inside the docker image:

$ AR=i686-w64-mingw32-ar CC=i686-w64-mingw32-gcc LD=i686-w64-mingw32-ld rake cross native gem

Or, build the lib first then compile:

$ cd libstemmer_c $ AR=i686-w64-mingw33-ar CC=i686-w64-mingw32-gcc LD=i686-w64-mingw32-ld make $ cd ../ $ rake cross native gem

== NOT A BUG

The stemming process is an algorithm to allow one to find the stem of an word (not the root of it). For further reference on stem vs. root, please check wikipedia articles on the topic:

== TODO

  • {Open issues}[https://github.com/aurelian/ruby-stemmer/issues]

== Note on Patches/Pull Requests

  • Fork the project from {github}[https://github.com/aurelian/ruby-stemmer]

  • Make your feature addition or {bug fix}[https://github.com/aurelian/ruby-stemmer/issues]

  • Add tests for it. This is important so I don't break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history.

    if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull

  • Send me a pull request. Bonus points for topic branches.

== Alternative Stemmers for Ruby

  • {stemmer4r}[https://rubygems.org/gems/stemmer4r] (ext)
  • {fast-stemmer}[https://rubygems.org/gems/fast-stemmer] (ext)
  • {uea-stemmer}[https://rubygems.org/gems/uea-stemmer] (ext)
  • {stemmer}[https://rubygems.org/gems/stemmer] (pure ruby)
  • add yours

== Copyright

Copyright (c) 2008-2020 {Aurelian Oancea}[http://locknet.ro]. See MIT-LICENSE for details.

== Contributors

  • {Aurelian Oancea}[https://github.com/aurelian]
  • {Yury Korolev}[https://github.com/yury] - various bug fixes
  • {Aaron Patterson}[https://github.com/tenderlove] - rake compiler (windows support), code cleanup
  • {Damián Silvani}[https://github.com/munshkr] - Ruby 1.9 encoding

encoding: utf-8

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].