All Categories → Compilers → tokenizer

Top 89 tokenizer open source projects

liblex
C library for Lexical Analysis
wink-tokenizer
Multilingual tokenizer that automatically tags each token with its type
jargon
Tokenizers and lemmatizers for Go
elasticsearch-plugins
Some native scoring script plugins for elasticsearch
neural tokenizer
Tokenize English sentences using neural networks.
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
psr2r-sniffer
A PSR-2-R code sniffer and code-style auto-correction-tool - including many useful additions
lex
Lex is an implementation of lex tool in Ruby.
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
tokenizer
A simple tokenizer in Ruby for NLP tasks.
gd-tokenizer
A small godot project with a tokenizer written in GDScript.
python-mecab
A repository to bind mecab for Python 3.5+. Not using swig nor pybind. (Not Maintained Now)
xontrib-output-search
Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
snapdragon-lexer
Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
chinese-tokenizer
Tokenizes Chinese texts into words.
suika
Suika 🍉 is a Japanese morphological analyzer written in pure Ruby
Text-Classification-LSTMs-PyTorch
The aim of this repository is to show a baseline model for text classification by implementing a LSTM-based model coded in PyTorch. In order to provide a better understanding of the model, it will be used a Tweets dataset provided by Kaggle.
Tokenizer
A tokenizer for Icelandic text
lexertk
C++ Lexer Toolkit Library (LexerTk) https://www.partow.net/programming/lexertk/index.html
Roy VnTokenizer
Vietnamese tokenizer (Maximum Matching and CRF)
greeb
Greeb is a simple Unicode-aware regexp-based tokenizer.
61-89 of 89 tokenizer projects