ilmultiTooling to play around with multilingual machine translation for Indian Languages.
liblexC library for Lexical Analysis
wink-tokenizerMultilingual tokenizer that automatically tags each token with its type
berserkerBerserker - BERt chineSE woRd toKenizER
jargonTokenizers and lemmatizers for Go
farasapyA Python implementation of Farasa toolkit
rustfstRust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
psr2r-snifferA PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
lexLex is an implementation of lex tool in Ruby.
hunspellHigh-Performance Stemmer, Tokenizer, and Spell Checker for R
tokenizerA simple tokenizer in Ruby for NLP tasks.
linderaA morphological analysis library.
SwiLexA universal lexer library in Swift.
gd-tokenizerA small godot project with a tokenizer written in GDScript.
python-mecabA repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
xontrib-output-searchGet identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
snapdragon-lexerConverts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
suikaSuika 🍉 is a Japanese morphological analyzer written in pure Ruby
Text-Classification-LSTMs-PyTorchThe aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
lexertkC++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
graspEssential NLP & ML, short & fast pure Python code
sinlingA collection of NLP tools for Sinhalese (සිංහල).
greebGreeb is a simple Unicode-aware regexp-based tokenizer.