(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (+61.29%)

Mutual labels: morphological-analysis

Chevrotain

Parser Building Toolkit for JavaScript

Stars: ✭ 1,795 (+5690.32%)

Mutual labels: tokenizer

yap

Yet Another (natural language) Parser

Stars: ✭ 40 (+29.03%)

Mutual labels: morphological-analysis

greeb

Greeb is a simple Unicode-aware regexp-based tokenizer.

Stars: ✭ 16 (-48.39%)

Mutual labels: tokenizer

Bitextor

Bitextor generates translation memories from multilingual websites.

Stars: ✭ 168 (+441.94%)

Mutual labels: tokenizer

Udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Stars: ✭ 160 (+416.13%)

Mutual labels: tokenizer

Neural-Morphological-Disambiguation-for-Turkish-DEPRECATED

Neural morphological disambiguation for Turkish. Implemented in DyNet

Stars: ✭ 11 (-64.52%)

Mutual labels: morphological-analysis

Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Stars: ✭ 132 (+325.81%)

Mutual labels: tokenizer

Tokenizer

A tokenizer for Icelandic text

Stars: ✭ 27 (-12.9%)

Mutual labels: tokenizer

Fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

Stars: ✭ 125 (+303.23%)

Mutual labels: tokenizer

lexertk

C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html

Stars: ✭ 26 (-16.13%)

Mutual labels: tokenizer

Js Tokens

Tiny JavaScript tokenizer.

Stars: ✭ 166 (+435.48%)

Mutual labels: tokenizer

Query Translator

Query Translator is a search query translator with AST representation

Stars: ✭ 165 (+432.26%)

Mutual labels: tokenizer

sinling

A collection of NLP tools for Sinhalese (සිංහල).

Stars: ✭ 38 (+22.58%)

Mutual labels: tokenizer

View All Similar Projects ➔

Suika

Suika 🍉 is a Japanese morphological analyzer written in pure Ruby.

Installation

Add this line to your application's Gemfile:

gem 'suika'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install suika

Usage

require 'suika'

tagger = Suika::Tagger.new
tagger.parse('すもももももももものうち').each { |token| puts token }

# すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ
# も      助詞,係助詞,*,*,*,*,も,モ,モ
# もも    名詞,一般,*,*,*,*,もも,モモ,モモ
# も      助詞,係助詞,*,*,*,*,も,モ,モ
# もも    名詞,一般,*,*,*,*,もも,モモ,モモ
# の      助詞,連体化,*,*,*,*,の,ノ,ノ
# うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ

Since the Tagger class loads the binary dictionary at initialization, it is recommended to reuse the instance.

tagger = Suika::Tagger.new

sentences.each do |sentence|
  result = tagger.parse(sentence)

  # ...
end

Test

Suika was able to parse all sentences in the Livedoor news corpus without any error.

require 'suika'

tagger = Suika::Tagger.new

Dir.glob('ldcc-20140209/text/*/*.txt').each do |filename|
  File.foreach(filename) do |sentence|
    sentence.strip!
    puts tagger.parse(sentence) unless sentence.empty?
  end
end

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/suika. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the BSD-3-Clause License. In addition, the gem includes binary data generated from mecab-ipadic. The details of the license can be found in LICENSE.txt and NOTICE.txt.

Respect

Taku Kudo is the author of MeCab that is the most famous morphological analyzer in Japan. MeCab is one of the great software in natural language processing. Suika is created with reference to the book on morphological analysis written by Dr. Kudo.
Tomoko Uchida is the author of Janome that is a Japanese morphological analysis engine written in pure Python. Suika is heavily influenced by Janome's idea to include the built-in dictionary and language model. Janome, a morphological analyzer written in scripting language, gives me the courage to develop Suika.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yoshoku / suika

Programming Languages

Labels

Projects that are alternatives of or similar to suika

Suika

Installation

Usage

Test

Contributing

License

Respect